git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Implement ref content consistency check
@ 2024-08-13 14:18 shejialuo
  2024-08-15 10:19 ` karthik nayak
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
  0 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-08-13 14:18 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak

Hi All:

We have already set up the infrastructure of the ref consistency.
However, we have only add ref name check when establishing the
infrastructure in below:

  https://lore.kernel.org/git/ZrSqMmD-quQ18a9F@ArchLinux.localdomain/

Actually, we already have a patch here which has already implemented the
ref content consistency check. But during the review process, we have
encountered some problems. The intention of this RFC is to make sure
what content we should check and also to what extend.

I conclude the following info:

1. For the regular ref which has a trailing garbage, we should warn the
user. This is the most simplest situation, we could reply on
"parse_loose_ref_content" to do this.
2. For the symref, we could also rely on "parse_loose_ref_content" to
get the "pointee", and check the location of the "pointee", check the
name of the "pointee" and the file type of the "pointee".
3. FOr the symbolic ref, we could follow the idea of 2.

But Patrick gives a question here:

> In case the ref ends with a newline, should we check that the next
> character is `\0`? Otherwise, it may contain multiple lines, which is
> not allowed for a normal ref.
>
> Also, shouldn't the ref always end with a newline?

For symref, I guess we have no spec here. From my experiments, a symref
could have a newline or no newline, even multiple newlines. And also
symref could have multiple spaces. But the following is a bad symref

  ref: refs/heads/main garbage

I think we should fully discuss what we should check here. Thus I will
implement the code.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [RFC] Implement ref content consistency check
  2024-08-13 14:18 [RFC] Implement ref content consistency check shejialuo
@ 2024-08-15 10:19 ` karthik nayak
  2024-08-15 13:37   ` shejialuo
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: karthik nayak @ 2024-08-15 10:19 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Junio C Hamano, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 2400 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> Hi All:
>
> We have already set up the infrastructure of the ref consistency.
> However, we have only add ref name check when establishing the
> infrastructure in below:
>
>   https://lore.kernel.org/git/ZrSqMmD-quQ18a9F@ArchLinux.localdomain/
>
> Actually, we already have a patch here which has already implemented the
> ref content consistency check. But during the review process, we have
> encountered some problems. The intention of this RFC is to make sure
> what content we should check and also to what extend.
>
> I conclude the following info:
>
> 1. For the regular ref which has a trailing garbage, we should warn the
> user. This is the most simplest situation, we could reply on
> "parse_loose_ref_content" to do this.
> 2. For the symref, we could also rely on "parse_loose_ref_content" to
> get the "pointee", and check the location of the "pointee", check the
> name of the "pointee" and the file type of the "pointee".
> 3. FOr the symbolic ref, we could follow the idea of 2.
>

Just to understand clearly, when you're talking about 'symbolic ref' you
are referring to symbolic links?

I ask because, as per our documentation in
'Documentation/git-symbolic-ref.txt':

  In the past, `.git/HEAD` was a symbolic link pointing at
  `refs/heads/master`.  When we wanted to switch to another branch, we
  did `ln -sf refs/heads/newbranch .git/HEAD`, and when we wanted to
  find out which branch we are on, we did `readlink .git/HEAD`. But
  symbolic links are not entirely portable, so they are now deprecated
  and symbolic refs (as described above) are used by default.

> But Patrick gives a question here:
>
>> In case the ref ends with a newline, should we check that the next
>> character is `\0`? Otherwise, it may contain multiple lines, which is
>> not allowed for a normal ref.
>>
>> Also, shouldn't the ref always end with a newline?
>
> For symref, I guess we have no spec here. From my experiments, a symref
> could have a newline or no newline, even multiple newlines. And also
> symref could have multiple spaces. But the following is a bad symref
>
>   ref: refs/heads/main garbage
>
> I think we should fully discuss what we should check here. Thus I will
> implement the code.
>

Agreed, in refs/files-backend.c:create_symref_lock, we write symrefs as
"ref: %s\n" so it makes sense to validate that there is nothing extra.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [RFC] Implement ref content consistency check
  2024-08-15 10:19 ` karthik nayak
@ 2024-08-15 13:37   ` shejialuo
  2024-08-16  9:06     ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-15 13:37 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, Junio C Hamano, Patrick Steinhardt

On Thu, Aug 15, 2024 at 03:19:50AM -0700, karthik nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > Hi All:
> >
> > We have already set up the infrastructure of the ref consistency.
> > However, we have only add ref name check when establishing the
> > infrastructure in below:
> >
> >   https://lore.kernel.org/git/ZrSqMmD-quQ18a9F@ArchLinux.localdomain/
> >
> > Actually, we already have a patch here which has already implemented the
> > ref content consistency check. But during the review process, we have
> > encountered some problems. The intention of this RFC is to make sure
> > what content we should check and also to what extend.
> >
> > I conclude the following info:
> >
> > 1. For the regular ref which has a trailing garbage, we should warn the
> > user. This is the most simplest situation, we could reply on
> > "parse_loose_ref_content" to do this.
> > 2. For the symref, we could also rely on "parse_loose_ref_content" to
> > get the "pointee", and check the location of the "pointee", check the
> > name of the "pointee" and the file type of the "pointee".
> > 3. FOr the symbolic ref, we could follow the idea of 2.
> >
> 
> Just to understand clearly, when you're talking about 'symbolic ref' you
> are referring to symbolic links?
> 

I am sorry about this. It's symbolic links here.

> > But Patrick gives a question here:
> >
> >> In case the ref ends with a newline, should we check that the next
> >> character is `\0`? Otherwise, it may contain multiple lines, which is
> >> not allowed for a normal ref.
> >>
> >> Also, shouldn't the ref always end with a newline?
> >
> > For symref, I guess we have no spec here. From my experiments, a symref
> > could have a newline or no newline, even multiple newlines. And also
> > symref could have multiple spaces. But the following is a bad symref
> >
> >   ref: refs/heads/main garbage
> >
> > I think we should fully discuss what we should check here. Thus I will
> > implement the code.
> >
> 
> Agreed, in refs/files-backend.c:create_symref_lock, we write symrefs as
> "ref: %s\n" so it makes sense to validate that there is nothing extra.

Yes, we should do this. I will implement the code and the send the
patches to the mailing list.

Thanks


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [RFC] Implement ref content consistency check
  2024-08-15 13:37   ` shejialuo
@ 2024-08-16  9:06     ` Patrick Steinhardt
  2024-08-16 16:39       ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-16  9:06 UTC (permalink / raw)
  To: shejialuo; +Cc: karthik nayak, git, Junio C Hamano

On Thu, Aug 15, 2024 at 09:37:44PM +0800, shejialuo wrote:
> On Thu, Aug 15, 2024 at 03:19:50AM -0700, karthik nayak wrote:
> > shejialuo <shejialuo@gmail.com> writes:
> > 
> > > Hi All:
> > >
> > > We have already set up the infrastructure of the ref consistency.
> > > However, we have only add ref name check when establishing the
> > > infrastructure in below:
> > >
> > >   https://lore.kernel.org/git/ZrSqMmD-quQ18a9F@ArchLinux.localdomain/
> > >
> > > Actually, we already have a patch here which has already implemented the
> > > ref content consistency check. But during the review process, we have
> > > encountered some problems. The intention of this RFC is to make sure
> > > what content we should check and also to what extend.
> > >
> > > I conclude the following info:
> > >
> > > 1. For the regular ref which has a trailing garbage, we should warn the
> > > user. This is the most simplest situation, we could reply on
> > > "parse_loose_ref_content" to do this.
> > > 2. For the symref, we could also rely on "parse_loose_ref_content" to
> > > get the "pointee", and check the location of the "pointee", check the
> > > name of the "pointee" and the file type of the "pointee".
> > > 3. FOr the symbolic ref, we could follow the idea of 2.
> > >
> > 
> > Just to understand clearly, when you're talking about 'symbolic ref' you
> > are referring to symbolic links?
> > 
> 
> I am sorry about this. It's symbolic links here.

Wait, is it really symbolic link? I don't think so, you actually were
talking about symbolic refs correctly. The fact that symbolic refs have
been implemented as a symbolic link in the past (and still can be used
for that purpose) is rather an implementation detail. But the overall
context, and what we actually want to check on disk, is a symbolic ref
in its modern incarnation.

And checking the format of both normal and symbolic refs does make sense
in my opinion.

> > > But Patrick gives a question here:
> > >
> > >> In case the ref ends with a newline, should we check that the next
> > >> character is `\0`? Otherwise, it may contain multiple lines, which is
> > >> not allowed for a normal ref.
> > >>
> > >> Also, shouldn't the ref always end with a newline?
> > >
> > > For symref, I guess we have no spec here. From my experiments, a symref
> > > could have a newline or no newline, even multiple newlines. And also
> > > symref could have multiple spaces. But the following is a bad symref
> > >
> > >   ref: refs/heads/main garbage
> > >
> > > I think we should fully discuss what we should check here. Thus I will
> > > implement the code.
> > >
> > 
> > Agreed, in refs/files-backend.c:create_symref_lock, we write symrefs as
> > "ref: %s\n" so it makes sense to validate that there is nothing extra.
> 
> Yes, we should do this. I will implement the code and the send the
> patches to the mailing list.

Agreed. We have to exclude pseudorefs (FETCH_HEAD, MERGE_HEAD, see
gitglossary(7)) from these checks, as those _are_ allowed to contain
extra data. But no other reference should carry more data than that.
Namely, a regular ref should always be "hex * hash_len + \n", while a
symbolic ref should always be "ref: $valid_refname\n".

A ref that does not conform to this is not a properly formatted
reference and thus worth being warned about.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [RFC] Implement ref content consistency check
  2024-08-16  9:06     ` Patrick Steinhardt
@ 2024-08-16 16:39       ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-16 16:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, karthik nayak, git

Patrick Steinhardt <ps@pks.im> writes:

>> > > 1. For the regular ref which has a trailing garbage, we should warn the
>> > > user. This is the most simplest situation, we could reply on
>> > > "parse_loose_ref_content" to do this.
>> > > 2. For the symref, we could also rely on "parse_loose_ref_content" to
>> > > get the "pointee", and check the location of the "pointee", check the
>> > > name of the "pointee" and the file type of the "pointee".
>> > > 3. FOr the symbolic ref, we could follow the idea of 2.
>> > 
>> > Just to understand clearly, when you're talking about 'symbolic ref' you
>> > are referring to symbolic links?
>> 
>> I am sorry about this. It's symbolic links here.
>
> Wait, is it really symbolic link? I don't think so, you actually were
> talking about symbolic refs correctly.

In #2, yes.  I think #3 is about what to do with a random symbolic
link inside or near .git/refs/ hierarchy, which may or may not meant
as a symref.  I agree that we should assume that the user meant them
to be used as symrefs, check its validity the same way as a textual
symrefs, and complain if they look bogus.

> And checking the format of both normal and symbolic refs does make sense
> in my opinion.

Yup.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v1 0/4] add ref content check for files backend
  2024-08-13 14:18 [RFC] Implement ref content consistency check shejialuo
  2024-08-15 10:19 ` karthik nayak
@ 2024-08-18 15:00 ` shejialuo
  2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
                     ` (4 more replies)
  1 sibling, 5 replies; 209+ messages in thread
From: shejialuo @ 2024-08-18 15:00 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This patch aims at adding ref content check for files backend. By the
RFC we have discussed, I add three types of checks.

1. Check regular ref content. I enhance the "parse_loose_ref_contents"
to validate the content of the regular ref and warn the user about the
trailing garbage.
2. Check symbolic ref content. Check the trailing garbage and content.
3. Check symlink ref by reusing the function introduced by #2.

The CI is passed:

  https://github.com/shejialuo/git/pull/14

Thanks,
Jialuo

shejialuo (4):
  fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
  ref: add regular ref content check for files backend
  ref: add symbolic ref content check for files backend
  ref: add symlink ref consistency check for files backend

 Documentation/fsck-msgids.txt |  12 +++
 fsck.h                        |  10 ++
 refs.c                        |   2 +-
 refs/files-backend.c          | 188 +++++++++++++++++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 183 +++++++++++++++++++++++++++++++++
 6 files changed, 392 insertions(+), 5 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
@ 2024-08-18 15:01   ` shejialuo
  2024-08-20 16:25     ` Junio C Hamano
  2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-18 15:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

In order to conveniently create a new "fsck_ref_report", add a new macro
"FSCK_REF_REPORT_DEFAULT".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 fsck.h               | 6 ++++++
 refs/files-backend.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 500b4c04d2..8894394d16 100644
--- a/fsck.h
+++ b/fsck.h
@@ -152,6 +152,12 @@ struct fsck_ref_report {
 	const char *referent;
 };
 
+#define FSCK_REF_REPORT_DEFAULT { \
+	.path = NULL, \
+	.oid = NULL, \
+	.referent = NULL, \
+}
+
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8d6ec9458d..725a4f52e3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
  2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
@ 2024-08-18 15:01   ` shejialuo
  2024-08-20 16:49     ` Junio C Hamano
  2024-08-22  8:48     ` Patrick Steinhardt
  2024-08-18 15:01   ` [PATCH v1 3/4] ref: add symbolic " shejialuo
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-08-18 15:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We implicitly reply on "git-fsck(1)" to check the consistency of regular
refs. However, when parsing the regular refs for files backend, we allow
the ref content to end with no newline or contain some garbages. We
should warn the user about above situations.

In order to provide above functionality, enhance the "git-refs verify"
command by adding consistency check for regular refs for files backend.

Add the following three fsck messages to represent the above situations:

1. "badRefContent(ERROR)": A ref has a bad content.
2. "refMissingNewline(WARN)": A valid ref does not end with newline.
3. "trailingRefContent(WARN)": A ref has trailing contents.

In order to tell whether the ref has trailing content, add a new
parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
function "files_fsck_refs_content" to check the regular refs.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 ++++
 fsck.h                        |  3 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 67 ++++++++++++++++++++++++++-
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 87 +++++++++++++++++++++++++++++++++++
 6 files changed, 166 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..1688c2f1fe 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has a bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
@@ -170,6 +173,12 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(WARN) A valid ref does not end with newline.
+
+`trailingRefContent`::
+	(WARN) A ref has trailing contents.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 8894394d16..975d9b9da9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
@@ -73,6 +74,8 @@ enum fsck_msg_type {
 	FUNC(HAS_DOTDOT, WARN) \
 	FUNC(HAS_DOTGIT, WARN) \
 	FUNC(NULL_SHA1, WARN) \
+	FUNC(REF_MISSING_NEWLINE, WARN) \
+	FUNC(TRAILING_REF_CONTENT, WARN) \
 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
 	FUNC(NUL_IN_COMMIT, WARN) \
 	FUNC(LARGE_PATHNAME, WARN) \
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 725a4f52e3..ae71692f36 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3430,6 +3434,64 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	const char *trailing = NULL;
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISREG(iter->st.st_mode)) {
+		if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+			ret = error_errno(_("%s/%s: unable to read the ref"),
+					  refs_check_dir, iter->relative_path);
+			goto cleanup;
+		}
+
+		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+					    ref_content.buf, &oid, &referent,
+					    &type, &trailing, &failure_errno)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "invalid ref content");
+			goto cleanup;
+		}
+
+		if (!(type & REF_ISSYMREF)) {
+			if (*trailing == '\0') {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_REF_MISSING_NEWLINE,
+						      "missing newline");
+				goto cleanup;
+			}
+
+			if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_TRAILING_REF_CONTENT,
+						      "trailing garbage in ref");
+				goto cleanup;
+			}
+		}
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3574,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..7c1910d784 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git commit --allow-empty -m second &&
+	git checkout -b branch-2 &&
+	git tag tag-2 &&
+	git checkout -b a/b/tag-2 &&
+
+	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-2-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-2-garbage &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%sx" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-1-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $tag_dir_prefix/tag-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-2-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $branch_dir_prefix/a/b/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v1 3/4] ref: add symbolic ref content check for files backend
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
  2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
  2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-08-18 15:01   ` shejialuo
  2024-08-22  8:53     ` Patrick Steinhardt
  2024-08-18 15:02   ` [PATCH v1 4/4] ref: add symlink ref consistency " shejialuo
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
  4 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-18 15:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symbolic ref points to.
Instead, we just check the content of the symbolic ref itself.

In order to check the content of the symbolic ref, create a function
"files_fsck_symref_target". It will first check whether the "pointee" is
under the "refs/" directory and then we will check the "pointee" itself.

There is no specification about the content of the symbolic ref.
Although we do write "ref: %s\n" to create a symbolic ref by using
"git-symbolic-ref(1)" command. However, this is not mandatory. We still
accept symbolic refs with null trailing garbage. Put it more specific,
the following are correct:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

But we do not allow any non-null trailing garbage. The following are bad
symbolic contents.

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

In order to provide above checks, we will traverse the "pointee" to
report the user whether this is null-garbage or no newline. And if
symbolic refs contain non-null garbage, we will report
"FSCK_MSG_BAD_REF_CONTENT" to the user.

Then, we will check the name of the "pointee" is correct by using
"check_refname_format". And then if we can access the "pointee_path" in
the file system, we should ensure that the file type is correct.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 87 +++++++++++++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 52 +++++++++++++++++++++
 4 files changed, 143 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 1688c2f1fe..73587661dc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badSymrefPointee`::
+	(ERROR) The pointee of a symref is bad.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 975d9b9da9..985b674dd9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_SYMREF_POINTEE, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ae71692f36..bfb8d338d2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,12 +3434,92 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "pointee_name" and "pointee_path". The caller should
+ * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
+ * would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    const char *refname,
+				    struct strbuf *pointee_name,
+				    struct strbuf *pointee_path)
+{
+	unsigned int newline_num = 0;
+	unsigned int space_num = 0;
+	const char *p = NULL;
+	struct stat st;
+	int ret = 0;
+
+	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
+
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	while (*p != '\0') {
+		if ((space_num || newline_num) && !isspace(*p)) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "contains non-null garbage");
+			goto out;
+		}
+
+		if (*p == '\n') {
+			newline_num++;
+		} else if (*p == ' ') {
+			space_num++;
+		}
+		p++;
+	}
+
+	if (space_num || newline_num > 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "trailing null-garbage");
+	} else if (!newline_num) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+	}
+
+	strbuf_rtrim(pointee_name);
+
+	if (check_refname_format(pointee_name->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to refname with invalid format");
+	}
+
+	/*
+	 * Missing target should not be treated as any error worthy event and
+	 * not even warn. It is a common case that a symbolic ref points to a
+	 * ref that does not exist yet. If the target ref does not exist, just
+	 * skip the check for the file type.
+	 */
+	if (lstat(pointee_path->buf, &st) < 0)
+		goto out;
+
+	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to an invalid file type");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
 	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
+	struct strbuf pointee_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3482,6 +3562,12 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 						      "trailing garbage in ref");
 				goto cleanup;
 			}
+		} else {
+			strbuf_addf(&pointee_path, "%s/%s",
+				    ref_store->gitdir, referent.buf);
+			ret = files_fsck_symref_target(o, &report, refname.buf,
+						       &referent,
+						       &pointee_path);
 		}
 	}
 
@@ -3489,6 +3575,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&pointee_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 7c1910d784..e8fc2ef015 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -176,4 +176,56 @@ test_expect_success 'regular ref content should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success 'symbolic ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git checkout -b a/b/branch-2 &&
+
+	printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v1 4/4] ref: add symlink ref consistency check for files backend
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
                     ` (2 preceding siblings ...)
  2024-08-18 15:01   ` [PATCH v1 3/4] ref: add symbolic " shejialuo
@ 2024-08-18 15:02   ` shejialuo
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
  4 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-18 15:02 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which are legacy symbolic links. We
should not check the trailing garbage for symbolic links. Add a new
parameter "symbolic_link" to disable some checks which should only be
used for symbolic ref.

We firstly use the "strbuf_add_real_path" to resolve the symlinks and
get the absolute path "pointee_path" which the symlink ref points to.
Then we can get the absolute path "abs_gitdir" of the "gitdir". By
combining "pointee_path" and "abs_gitdir", we can extract the
"referent". Thus, we can reuse "files_fsck_symref_target" function to
seamlessly check the symlink refs.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     | 82 ++++++++++++++++++++++++++++------------
 t/t0602-reffiles-fsck.sh | 44 +++++++++++++++++++++
 2 files changed, 101 insertions(+), 25 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index bfb8d338d2..398afedaf0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../copy.h"
 #include "../environment.h"
 #include "../gettext.h"
@@ -3437,13 +3438,15 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 /*
  * Check the symref "pointee_name" and "pointee_path". The caller should
  * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
- * would be the content after "refs:".
+ * would be the content after "refs:". For symblic link, "pointee_name" would
+ * be the relative path agaignst "gitdir".
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    const char *refname,
 				    struct strbuf *pointee_name,
-				    struct strbuf *pointee_path)
+				    struct strbuf *pointee_path,
+				    unsigned int symbolic_link)
 {
 	unsigned int newline_num = 0;
 	unsigned int space_num = 0;
@@ -3459,34 +3462,36 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	while (*p != '\0') {
-		if ((space_num || newline_num) && !isspace(*p)) {
-			ret = fsck_report_ref(o, report,
-					      FSCK_MSG_BAD_REF_CONTENT,
-					      "contains non-null garbage");
-			goto out;
+	if (!symbolic_link) {
+		while (*p != '\0') {
+			if ((space_num || newline_num) && !isspace(*p)) {
+				ret = fsck_report_ref(o, report,
+						      FSCK_MSG_BAD_REF_CONTENT,
+						      "contains non-null garbage");
+				goto out;
+			}
+
+			if (*p == '\n') {
+				newline_num++;
+			} else if (*p == ' ') {
+				space_num++;
+			}
+			p++;
 		}
 
-		if (*p == '\n') {
-			newline_num++;
-		} else if (*p == ' ') {
-			space_num++;
+		if (space_num || newline_num > 1) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "trailing null-garbage");
+		} else if (!newline_num) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "missing newline");
 		}
-		p++;
-	}
 
-	if (space_num || newline_num > 1) {
-		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_TRAILING_REF_CONTENT,
-				      "trailing null-garbage");
-	} else if (!newline_num) {
-		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_REF_MISSING_NEWLINE,
-				      "missing newline");
+		strbuf_rtrim(pointee_name);
 	}
 
-	strbuf_rtrim(pointee_name);
-
 	if (check_refname_format(pointee_name->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_SYMREF_POINTEE,
@@ -3521,8 +3526,10 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
 	struct strbuf pointee_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
+	unsigned int symbolic_link = 0;
 	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
@@ -3567,8 +3574,32 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				    ref_store->gitdir, referent.buf);
 			ret = files_fsck_symref_target(o, &report, refname.buf,
 						       &referent,
-						       &pointee_path);
+						       &pointee_path,
+						       symbolic_link);
+		}
+	} else if (S_ISLNK(iter->st.st_mode)) {
+		const char *pointee_name = NULL;
+
+		symbolic_link = 1;
+
+		strbuf_add_real_path(&pointee_path, iter->path.buf);
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		if (!skip_prefix(pointee_path.buf,
+				 abs_gitdir.buf, &pointee_name)) {
+			ret = fsck_report_ref(o, &report,
+					       FSCK_MSG_BAD_SYMREF_POINTEE,
+					       "point to target outside gitdir");
+			goto cleanup;
 		}
+
+		strbuf_addstr(&referent, pointee_name);
+		ret = files_fsck_symref_target(o, &report, refname.buf,
+					       &referent, &pointee_path,
+					       symbolic_link);
 	}
 
 cleanup:
@@ -3576,6 +3607,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&pointee_path);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index e8fc2ef015..c6e93e4757 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -228,4 +228,48 @@ test_expect_success 'symbolic ref content should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success SYMLINKS 'symbolic ref (symbolic link) content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git checkout -b a/b/branch-2 &&
+
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: point to target outside gitdir
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./".branch" $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
  2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
@ 2024-08-20 16:25     ` Junio C Hamano
  2024-08-21 12:49       ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-20 16:25 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
> "referent" is NULL. So, we need to always initialize these parameters to
> NULL instead of letting them point to anywhere when creating a new
> "fsck_ref_report" structure.

The above is correct, but ...

>  	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> -		struct fsck_ref_report report = { .path = NULL };
> +		struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;

... the code without this patch is already doing so.

When designated initializers are used to initialize a struct, all
members that are not initialized explicitly are implicitly
initialized the same as for objects that have static storage
duration (meaning: pointers are initialized to NULL, arithmetics are
initialized to zero).

So I do not quite see why this change is needed.  By hiding the fact
that the "report" structure is zero-initialized behind the macro, it
makes it less obvious that we are clearing everything.

If the patch were to rewrite the above like so:

		struct fsck_ref_report report = { 0 }

it would make it even more clear that everything is zero
initialized, and also makes it obvious that .path member is not any
special.

Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-08-20 16:49     ` Junio C Hamano
  2024-08-21 14:21       ` shejialuo
  2024-08-22  8:46       ` Patrick Steinhardt
  2024-08-22  8:48     ` Patrick Steinhardt
  1 sibling, 2 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-20 16:49 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> We implicitly reply on "git-fsck(1)" to check the consistency of regular

"reply" -> "rely", I think.

> refs. However, when parsing the regular refs for files backend, we allow
> the ref content to end with no newline or contain some garbages. We
> should warn the user about above situations.

Hmph, should we?  

If the content is short (e.g., in SHA-1 repository it only has 39
hexdigit) even if that may be sufficient to uniquely name the
object, we should warn about it, of course.  A file that has
64-hexdigit with a terminating LF at the end may be a valid file to
be in $GIT_DIR/refs/ hierarchy in a SHA-256 repository, but such a
file in a SHA-1 repository should also be subject to a warning, as
it could be a sign that somebody screwed up object format
conversion.

But a file that has only 40-hexdigit without a terminating LF at the
end?  Or a file that has 40-hexdigit followed by a CRLF instead of
LF?  Or a file that has the identical content as a valid ref on its
first line, but has extra stuff on its second and subsequent lines?

What does the name-to-object-name-mapping layer (aka "get_oid" API)
do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
they are treated as valid ref in the "normal" code path, it needs a
strong justification to tighten the rules retroactively, much
stronger than "Our current code, and any of our older versions,
would have written such a file as a loose ref with our code."

"What are we protecting us from with this tightening?" is the
question we should be asking ourselves, when evaluating each of
these new rules that fsck used not to care about.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
  2024-08-20 16:25     ` Junio C Hamano
@ 2024-08-21 12:49       ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-21 12:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Tue, Aug 20, 2024 at 09:25:56AM -0700, Junio C Hamano wrote:
> >  	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> > -		struct fsck_ref_report report = { .path = NULL };
> > +		struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
> 
> ... the code without this patch is already doing so.
> 
> When designated initializers are used to initialize a struct, all
> members that are not initialized explicitly are implicitly
> initialized the same as for objects that have static storage
> duration (meaning: pointers are initialized to NULL, arithmetics are
> initialized to zero).
> 
> So I do not quite see why this change is needed.  By hiding the fact
> that the "report" structure is zero-initialized behind the macro, it
> makes it less obvious that we are clearing everything.
> 
> If the patch were to rewrite the above like so:
> 
> 		struct fsck_ref_report report = { 0 }
> 
> it would make it even more clear that everything is zero
> initialized, and also makes it obvious that .path member is not any
> special.
> 

Yes, I should use this way. Thanks.

> Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-20 16:49     ` Junio C Hamano
@ 2024-08-21 14:21       ` shejialuo
  2024-08-22  8:46       ` Patrick Steinhardt
  1 sibling, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-21 14:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.

I will fix in the next version.

> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
>

I am very sorry about this. Actually, I should not use "should". I don't
give compelling reasons here why we need to introduce such checks. I
just told the reviewer "we should warn". I will try to avoid above
mistakes where I didn't give enough motivation.

> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 

Let me first talk about what will happen when we use the following
command:

  $ git checkout bad-branch

I use "gdb" to find the following call sequence:

  "cmd_checkout" -> "checkout_main" -> "parse_branchname_arg" ->
  ... -> "get_oid_basic" -> "repo_dwim_ref" -> ... ->
  "parse_loose_ref_contents" -> "parse_oid_hex_algop" ->
  "get_oid_hex_algop"

I dive into the "object-name.c::get_oid_basic" function. If we pass the
actually "oid", it will call the "get_oid_hex_algop" directly.
Otherwise, it will execute the following code:

  if (!len && reflog_len)
      refs_found = ...;
  else if (reflog_len)
      refs_found = ...
  else
      refs_found = repo_dwim_ref(r, str, len, oid, &real_ref, !fatal);

  if (!refs_found)
      return -1;

As we can see, when there is no corresponding refs found by calling
"repo_dwim_ref" function, "get_oid_basic" function will return -1. And
here we could have one important conclusion:

  The "get_oid_basic" function relies on "repo_dwim_ref" function to
  parse the ref and get the pointee "oid". So, it uses the interfaces
  provided by ref backend.

Next, we look at what will "parse_loose_ref_contents" do for regular
refs.

  int parse_loose_ref_contents(...)
  {
      ...
      if (parse_oid_hex_algop(buf, oid, *p, algop) ||
         (*p != '\0' && !isspace(*p))) {
            *type |= REF_ISBROKEN;
            *failure_errno = EINVAL;
            return -1;
      }
      return 0;
  }

Let's continue to see what "parse_oid_hex_algop" will do:

  int parse_oid_hex_algop(...)
  {
      int ret = get_oid_hex_algop(hex, oid, algop);
      if (!ret) {
          *end = hex + algop->hexsz;
      }
      return ret;
  }

If the result of "get_oid_hex_algop" is successful. We will set the
"end" pointer here. The "get_oid_hex_algop" will eventually call the
"get_hash_hex_algop" function

  static int get_hash_hex_algop(...)
  {
      int i;
      for (i = 0; i < algop->rawsz; i++) {
          int val = hex2chr(hex);
          if (val < 0)
              return -1;
          *hash+= = val;
          hex += 2;
      }
      return 0;
  }

This function will convert the hex to char by the raw size of the
algorithm. And by the following code, we could conclude the following
things:

1. "41053a9084501db79c72b14e8a5a0b67de3f91ae" is correct, because it
will be parsed successfully by "get_hash_hex_algop" and "*p == '\0'".
2. "41053a9084501db79c72b14e8a5a0b67de3f91aef" is not correct, it will
be parsed successfully by "get_hash_hex_algop" but "*p != '\0'"
and "isspace(*p)" is false. So the check in "parse_loose_ref_contents"
cannot be passed.
3. "1053a9084501db79c72b14e8a5a0b67de3f91a" is not correct, it cannot be
parsed successfully by "get_hash_hex_algop".
4. "41053a9084501db79c72b14e8a5a0b67de3f91ae garbage" is correct,
because it will be parsed successfully by "get_hash_hex_algop" and
"isspace(*p)" is true.

By the above discussion, I could answer you comments one by one.

> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.

When the content is short, although it may be sufficient to identify the
object, we should still report an error here. This is because we care
about the ref. As we can see from above discussion, the "object-name.c"
totally relies on the interfaces provided by the ref backend. And
"get_hash_hex_algop" is unhappy about this situation. And eventually the
"object-name.c::get_oid_basic" will be unhappy, return -1.

> A file that has 64-hexdigit with a terminating LF at the end may be
> a valid file to be in $GIT_DIR/refs/ hierarchy in a SHA-256
> repository, but such a file in a SHA-1 repository should also be
> subject to a warning, as it could be a sign that somebody screwed up
> object format conversion.

I agree with this idea. But in this implementation, we want to reuse the
"parse_loose_ref_contents" to check the consistency of the regular refs.
If we are in a SHA-1 repository, "parse_loose_ref_contents" will be
unhappy about this. However, I don't think we need to provide user that
"the content is 64-hexdigit ...". We just report "bad ref content" to
the user. This will also indicate the user something is wrong, you need
to check the ref database.

> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?

This is the core problem why we want to introduce more strict check.
Because in the current "parse_loose_ref_contents" function, as long as
the next byte of the end of the hex is '\0', spaces, LF, CRLF. We could
know that the content of the ref is OK.

But in my view, we should warn the user about this situation. This is
because in the original code, we do not check the ref strictly for files
backend. And I think at current, the normal user should not interact
with the git database. If there are some garbages we found in the ref
database, I guess this could be a sign for the user: "Watch out! there
may be something wrong".

> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

That's a hard question, really. I find it hard to know what should we
do? The motivation is hard to describe. But I think this reply could
make thing more clear here.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-20 16:49     ` Junio C Hamano
  2024-08-21 14:21       ` shejialuo
@ 2024-08-22  8:46       ` Patrick Steinhardt
  2024-08-22 16:13         ` Junio C Hamano
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-22  8:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git, Karthik Nayak

On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.
> 
> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
> 
> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.  A file that has
> 64-hexdigit with a terminating LF at the end may be a valid file to
> be in $GIT_DIR/refs/ hierarchy in a SHA-256 repository, but such a
> file in a SHA-1 repository should also be subject to a warning, as
> it could be a sign that somebody screwed up object format
> conversion.
> 
> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?
> 
> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 
> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

I'd say filesystem corruption, buggy implementations and compatibility
with other implementations of Git. The format for refs does not allow
for any other information than either an object ID for plain refs, and
the referee for symbolic refs. The fact that we do accept that is a mere
implementation detail because we reuse the same function to parse refs
that we also use for pseudorefs. And these _can_ have additional data.

So any reference that contains additional data is not a proper ref and
thus should be warned about from my point of view. No Git tooling should
write them, so if something does it's a red flag to me.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
  2024-08-20 16:49     ` Junio C Hamano
@ 2024-08-22  8:48     ` Patrick Steinhardt
  2024-08-22 12:06       ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-22  8:48 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Aug 18, 2024 at 11:01:44PM +0800, shejialuo wrote:
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISREG(iter->st.st_mode)) {

We can avoid having to indent the remainder of this function if we `goto
cleanup` here.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 3/4] ref: add symbolic ref content check for files backend
  2024-08-18 15:01   ` [PATCH v1 3/4] ref: add symbolic " shejialuo
@ 2024-08-22  8:53     ` Patrick Steinhardt
  2024-08-22 12:42       ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-22  8:53 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Aug 18, 2024 at 11:01:52PM +0800, shejialuo wrote:
> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symbolic ref points to.
> Instead, we just check the content of the symbolic ref itself.
> 
> In order to check the content of the symbolic ref, create a function
> "files_fsck_symref_target". It will first check whether the "pointee" is
> under the "refs/" directory and then we will check the "pointee" itself.
> 
> There is no specification about the content of the symbolic ref.
> Although we do write "ref: %s\n" to create a symbolic ref by using
> "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> accept symbolic refs with null trailing garbage. Put it more specific,
> the following are correct:
> 
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
> 
> But we do not allow any non-null trailing garbage. The following are bad
> symbolic contents.
> 
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
> 
> In order to provide above checks, we will traverse the "pointee" to
> report the user whether this is null-garbage or no newline. And if
> symbolic refs contain non-null garbage, we will report
> "FSCK_MSG_BAD_REF_CONTENT" to the user.
> 
> Then, we will check the name of the "pointee" is correct by using
> "check_refname_format". And then if we can access the "pointee_path" in
> the file system, we should ensure that the file type is correct.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |  3 ++
>  fsck.h                        |  1 +
>  refs/files-backend.c          | 87 +++++++++++++++++++++++++++++++++++
>  t/t0602-reffiles-fsck.sh      | 52 +++++++++++++++++++++
>  4 files changed, 143 insertions(+)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 1688c2f1fe..73587661dc 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,9 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badSymrefPointee`::
> +	(ERROR) The pointee of a symref is bad.
> +
>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> diff --git a/fsck.h b/fsck.h
> index 975d9b9da9..985b674dd9 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>  	FUNC(BAD_REF_CONTENT, ERROR) \
>  	FUNC(BAD_REF_FILETYPE, ERROR) \
>  	FUNC(BAD_REF_NAME, ERROR) \
> +	FUNC(BAD_SYMREF_POINTEE, ERROR) \
>  	FUNC(BAD_TIMEZONE, ERROR) \
>  	FUNC(BAD_TREE, ERROR) \
>  	FUNC(BAD_TREE_SHA1, ERROR) \
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index ae71692f36..bfb8d338d2 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3434,12 +3434,92 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +/*
> + * Check the symref "pointee_name" and "pointee_path". The caller should
> + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
> + * would be the content after "refs:".
> + */
> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    const char *refname,
> +				    struct strbuf *pointee_name,
> +				    struct strbuf *pointee_path)
> +{
> +	unsigned int newline_num = 0;
> +	unsigned int space_num = 0;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> +
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	while (*p != '\0') {

We typically write this `while (*p)`.

> +		if ((space_num || newline_num) && !isspace(*p)) {
> +			ret = fsck_report_ref(o, report,
> +					      FSCK_MSG_BAD_REF_CONTENT,
> +					      "contains non-null garbage");
> +			goto out;
> +		}
> +
> +		if (*p == '\n') {
> +			newline_num++;
> +		} else if (*p == ' ') {
> +			space_num++;
> +		}
> +		p++;
> +	}

Can't we replace this with a single `strchr('\n')` call to check for the
newline and then verify that the next character is a `\0`? The check for
spaces would then be handled by `check_refname_format()`.

> +	/*
> +	 * Missing target should not be treated as any error worthy event and
> +	 * not even warn. It is a common case that a symbolic ref points to a
> +	 * ref that does not exist yet. If the target ref does not exist, just
> +	 * skip the check for the file type.
> +	 */
> +	if (lstat(pointee_path->buf, &st) < 0)
> +		goto out;
> +
> +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to an invalid file type");
> +		goto out;
> +	}

What exactly are we guarding against here? Don't we already verify that
files in `refs/` have the correct type? Or are we checking that it does
not point to a directory?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-22  8:48     ` Patrick Steinhardt
@ 2024-08-22 12:06       ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-22 12:06 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Thu, Aug 22, 2024 at 10:48:30AM +0200, Patrick Steinhardt wrote:
>
> We can avoid having to indent the remainder of this function if we `goto
> cleanup` here.
> 

Yes, actually I have thought about this way. But I don't want to use
"goto". However, ident is noisy too. I will fix in the next version.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 3/4] ref: add symbolic ref content check for files backend
  2024-08-22  8:53     ` Patrick Steinhardt
@ 2024-08-22 12:42       ` shejialuo
  2024-08-23  5:36         ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-22 12:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:


> > +		if ((space_num || newline_num) && !isspace(*p)) {
> > +			ret = fsck_report_ref(o, report,
> > +					      FSCK_MSG_BAD_REF_CONTENT,
> > +					      "contains non-null garbage");
> > +			goto out;
> > +		}
> > +
> > +		if (*p == '\n') {
> > +			newline_num++;
> > +		} else if (*p == ' ') {
> > +			space_num++;
> > +		}
> > +		p++;
> > +	}
> 
> Can't we replace this with a single `strchr('\n')` call to check for the
> newline and then verify that the next character is a `\0`? The check for
> spaces would then be handled by `check_refname_format()`.
> 

We cannot. Think about this situation.

  "ref: refs/heads/master  \n   "

We find that the next character of '\n' is not '\0'. Then we leave it to
"check_refname_format". But "check_refname_format" will report an error
here, but this is an allowed symref.

But I think using `strchr` is a nice way. I will try to find an elegant
way here to handle this logic here.

> > +	/*
> > +	 * Missing target should not be treated as any error worthy event and
> > +	 * not even warn. It is a common case that a symbolic ref points to a
> > +	 * ref that does not exist yet. If the target ref does not exist, just
> > +	 * skip the check for the file type.
> > +	 */
> > +	if (lstat(pointee_path->buf, &st) < 0)
> > +		goto out;
> > +
> > +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to an invalid file type");
> > +		goto out;
> > +	}
> 
> What exactly are we guarding against here? Don't we already verify that
> files in `refs/` have the correct type? Or are we checking that it does
> not point to a directory?
> 

When scanning the "refs" directory, we will check the file in the ref
database, but we ignore the directory. So we are checking to know
whether it does not point to a directory. If the ref points to a bad
file type for example "ref/heads/bad-file"

If it is a block type file. We will first report that "refs/heads/bad-file"
is a bad file and then report ref points to bad file
"refs/heads/bad-file".

Actually, I think this is a little redundant here, but we can be
tolerant about this because we need to guard against directory. We need
to consider this situation.

So we could let this be.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-22  8:46       ` Patrick Steinhardt
@ 2024-08-22 16:13         ` Junio C Hamano
  2024-08-22 16:17           ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-22 16:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

> So any reference that contains additional data is not a proper ref and
> thus should be warned about from my point of view. No Git tooling should
> write them, so if something does it's a red flag to me.

If you find such a file in $GIT_DIR/refs/ hierarchy, because our
consumer side has been looser than necessary forever, and we never
have written such a file ourselves, it is a sign that a third-party
tool wrote it, and that the third-party tool used our reader
implementation as the specification.  That is why I am hesitant to
retroactively tighten the rules like this patch does.

Thanks.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-22 16:13         ` Junio C Hamano
@ 2024-08-22 16:17           ` Junio C Hamano
  2024-08-23  7:21             ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-22 16:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> So any reference that contains additional data is not a proper ref and
>> thus should be warned about from my point of view. No Git tooling should
>> write them, so if something does it's a red flag to me.
>
> If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> consumer side has been looser than necessary forever, and we never
> have written such a file ourselves, it is a sign that a third-party
> tool wrote it, and that the third-party tool used our reader
> implementation as the specification.  That is why I am hesitant to
> retroactively tighten the rules like this patch does.

I forgot to add my recommended course of action, without which a
review is worth much less X-<.

I am OK if we tightened the rules retroactively, as long as it
starts as a probing check (i.e. "info: we found an unusual thing
in the wild. Please report this to us so that we can ask you for
more details like how such a ref that would violate a rule that was
retroactively tightened got there", not "error: malformed ref").

Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 3/4] ref: add symbolic ref content check for files backend
  2024-08-22 12:42       ` shejialuo
@ 2024-08-23  5:36         ` Patrick Steinhardt
  2024-08-23 11:37           ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-23  5:36 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote:
> On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:
> 
> 
> > > +		if ((space_num || newline_num) && !isspace(*p)) {
> > > +			ret = fsck_report_ref(o, report,
> > > +					      FSCK_MSG_BAD_REF_CONTENT,
> > > +					      "contains non-null garbage");
> > > +			goto out;
> > > +		}
> > > +
> > > +		if (*p == '\n') {
> > > +			newline_num++;
> > > +		} else if (*p == ' ') {
> > > +			space_num++;
> > > +		}
> > > +		p++;
> > > +	}
> > 
> > Can't we replace this with a single `strchr('\n')` call to check for the
> > newline and then verify that the next character is a `\0`? The check for
> > spaces would then be handled by `check_refname_format()`.
> > 
> 
> We cannot. Think about this situation.
> 
>   "ref: refs/heads/master  \n   "
> 
> We find that the next character of '\n' is not '\0'. Then we leave it to
> "check_refname_format". But "check_refname_format" will report an error
> here, but this is an allowed symref.

Wouldn't it be correct to warn about this? To me the above very much
looks like garbage after the refname, same like we'd also warn about
such garbage for direct refs.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-22 16:17           ` Junio C Hamano
@ 2024-08-23  7:21             ` Patrick Steinhardt
  2024-08-23 11:30               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-23  7:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git, Karthik Nayak

On Thu, Aug 22, 2024 at 09:17:08AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> So any reference that contains additional data is not a proper ref and
> >> thus should be warned about from my point of view. No Git tooling should
> >> write them, so if something does it's a red flag to me.
> >
> > If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> > consumer side has been looser than necessary forever, and we never
> > have written such a file ourselves, it is a sign that a third-party
> > tool wrote it, and that the third-party tool used our reader
> > implementation as the specification.  That is why I am hesitant to
> > retroactively tighten the rules like this patch does.
> 
> I forgot to add my recommended course of action, without which a
> review is worth much less X-<.
> 
> I am OK if we tightened the rules retroactively, as long as it
> starts as a probing check (i.e. "info: we found an unusual thing
> in the wild. Please report this to us so that we can ask you for
> more details like how such a ref that would violate a rule that was
> retroactively tightened got there", not "error: malformed ref").

Okay, that makes sense. The fsck infrastructure does have info message
types, so this should certainly be doable. I'd argue that we might want
to make this an `FSCK_WARN`, but I'm also fine with iteratively bumping
up the severity from INFO to WARN to ERROR when we don't observe any
complaints about this tightening.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
  2024-08-23  7:21             ` Patrick Steinhardt
@ 2024-08-23 11:30               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-23 11:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Karthik Nayak

On Fri, Aug 23, 2024 at 09:21:14AM +0200, Patrick Steinhardt wrote:
> On Thu, Aug 22, 2024 at 09:17:08AM -0700, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> > 
> > > Patrick Steinhardt <ps@pks.im> writes:
> > >
> > >> So any reference that contains additional data is not a proper ref and
> > >> thus should be warned about from my point of view. No Git tooling should
> > >> write them, so if something does it's a red flag to me.
> > >
> > > If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> > > consumer side has been looser than necessary forever, and we never
> > > have written such a file ourselves, it is a sign that a third-party
> > > tool wrote it, and that the third-party tool used our reader
> > > implementation as the specification.  That is why I am hesitant to
> > > retroactively tighten the rules like this patch does.
> > 
> > I forgot to add my recommended course of action, without which a
> > review is worth much less X-<.
> > 
> > I am OK if we tightened the rules retroactively, as long as it
> > starts as a probing check (i.e. "info: we found an unusual thing
> > in the wild. Please report this to us so that we can ask you for
> > more details like how such a ref that would violate a rule that was
> > retroactively tightened got there", not "error: malformed ref").
> 
> Okay, that makes sense. The fsck infrastructure does have info message
> types, so this should certainly be doable. I'd argue that we might want
> to make this an `FSCK_WARN`, but I'm also fine with iteratively bumping
> up the severity from INFO to WARN to ERROR when we don't observe any
> complaints about this tightening.
> 

From the perspective of the implementation, there is no difference
between the info and warn. But I have a doubt here. Do we really
distinguish the info and warn in code?

Let's see the "fsck_vreport" (although this is a new function, but I
never change the implementation) function:

  static int fsck_vreport(...)
  {
      enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);

      if (msg_type == FSCK_FATAL)
          msg_type = FSCK_ERROR;
      if (msg_type == FSCK_INFO)
          msg_type = FSCK_WARN;

      ...
  }

We eventually convert the "FSCK_INFO" to "FSCK_WARN". Confusing.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v1 3/4] ref: add symbolic ref content check for files backend
  2024-08-23  5:36         ` Patrick Steinhardt
@ 2024-08-23 11:37           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-23 11:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Fri, Aug 23, 2024 at 07:36:10AM +0200, Patrick Steinhardt wrote:
> On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote:
> > On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:
> > 
> > 
> > > > +		if ((space_num || newline_num) && !isspace(*p)) {
> > > > +			ret = fsck_report_ref(o, report,
> > > > +					      FSCK_MSG_BAD_REF_CONTENT,
> > > > +					      "contains non-null garbage");
> > > > +			goto out;
> > > > +		}
> > > > +
> > > > +		if (*p == '\n') {
> > > > +			newline_num++;
> > > > +		} else if (*p == ' ') {
> > > > +			space_num++;
> > > > +		}
> > > > +		p++;
> > > > +	}
> > > 
> > > Can't we replace this with a single `strchr('\n')` call to check for the
> > > newline and then verify that the next character is a `\0`? The check for
> > > spaces would then be handled by `check_refname_format()`.
> > > 
> > 
> > We cannot. Think about this situation.
> > 
> >   "ref: refs/heads/master  \n   "
> > 
> > We find that the next character of '\n' is not '\0'. Then we leave it to
> > "check_refname_format". But "check_refname_format" will report an error
> > here, but this is an allowed symref.
> 
> Wouldn't it be correct to warn about this? To me the above very much
> looks like garbage after the refname, same like we'd also warn about
> such garbage for direct refs.
> 

Yes, we should warn about this. But only null-garbage is allowed for
symref. The following situation is bad:

  "ref: refs/heads/master  \n   garbage\n"

We should report error here, from my perspective, it's a FATAL ERROR.
However, let's decide how to do this when we know what fsck error level
we should set.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v2 0/4] add ref content check for files backend
  2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
                     ` (3 preceding siblings ...)
  2024-08-18 15:02   ` [PATCH v1 4/4] ref: add symlink ref consistency " shejialuo
@ 2024-08-27 16:04   ` shejialuo
  2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
                       ` (6 more replies)
  4 siblings, 7 replies; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:04 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This new version handles the following reviews:

1. According to the advice from the Junio, we should just use "{0}" to
initialize the zero structure "fsck_ref_report". This version handles
this in [PATCH v2 1/4].
2. According to the advice from the Patrick, use "strrchr" instead of
looping to make the code more clean in [PATCH v2 3/4].
3. Use "goto" to remove ident.

However, the most important thing for this patch is which fsck message
type I choose. I have recorded the reason in the commit message. But I
wanna explain the motivation in cover letter for making the reviewers
easy to understand.

Actually, in the review process of the first version. Junio thought we
should use "FSCK_INFO" and Patrick thought we should use "FSCK_WARN".
And I raised a question here, what is the difference between the
"FSCK_INFO" and "FSCK_WARN" because in "fsck.c::fsck_vreport" function,
we will convert "FSCK_INFO" to "FSCK_WARN" like the following:

    static int fsck_vreport(...)
    {
        enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);

        if (msg_type == FSCK_FATAL)
            msg_type = FSCK_ERROR;
        if (msg_type == FSCK_INFO)
             msg_type = FSCK_WARN;
        ...
    }

And I have gone back to the history. Actually the first time the fsck
message type was set up at f27d05b170 (fsck: allow upgrading fsck
warnings to errors, 2015-06-22):

  https://lore.kernel.org/git/cover.1418055173.git.johannes.schindelin@gmx.de/

And I have understood why we need "FSCK_INFO". This is because when
setting the "strict" filed in "fsck_options", all the fsck warns will
become fsck errors. For example, this change verifies my thinking:
4dd3b045f5 (fsck: downgrade tree badFilemode to "info", 2022-08-10).

As you can see, this restriction makes the code safer. So, I agree with
Junio, at now, we should use "FSCK_INFO" for trailing garbage and ref
content ends without newline.

But we should report fsck errors for the following two situations for
"git-fsck(1)" will report fsck errors by implicitly checking the ref
database consistency.

1. "parse_loose_ref_contents" fail.
2. symref content is bad (cannot parse).

Thanks,
Jialuo

shejialuo (4):
  ref: initialize "fsck_ref_report" with zero
  ref: add regular ref content check for files backend
  ref: add symbolic ref content check for files backend
  ref: add symlink ref check for files backend

 Documentation/fsck-msgids.txt |  12 +++
 fsck.h                        |   4 +
 refs.c                        |   2 +-
 refs/files-backend.c          | 179 +++++++++++++++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 185 ++++++++++++++++++++++++++++++++++
 6 files changed, 379 insertions(+), 5 deletions(-)

Range-diff against v1:
1:  9ed3026ac5 ! 1:  0367904c81 fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
    @@ Metadata
     Author: shejialuo <shejialuo@gmail.com>
     
      ## Commit message ##
    -    fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro
    +    ref: initialize "fsck_ref_report" with zero
     
         In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
         "referent" is NULL. So, we need to always initialize these parameters to
         NULL instead of letting them point to anywhere when creating a new
         "fsck_ref_report" structure.
     
    -    In order to conveniently create a new "fsck_ref_report", add a new macro
    -    "FSCK_REF_REPORT_DEFAULT".
    +    The original code explicitly specifies the ".path" field to initialize
    +    the "fsck_ref_report" structure. However, it introduces confusion how we
    +    initialize the other fields. In order to avoid this, initialize the
    +    "fsck_ref_report" with zero to make clear that everything in
    +    "fsck_ref_report" is zero initialized.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
         Signed-off-by: shejialuo <shejialuo@gmail.com>
     
    - ## fsck.h ##
    -@@ fsck.h: struct fsck_ref_report {
    - 	const char *referent;
    - };
    - 
    -+#define FSCK_REF_REPORT_DEFAULT { \
    -+	.path = NULL, \
    -+	.oid = NULL, \
    -+	.referent = NULL, \
    -+}
    -+
    - struct fsck_options {
    - 	fsck_walk_func walk;
    - 	fsck_error error_func;
    -
      ## refs/files-backend.c ##
     @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
      		goto cleanup;
      
      	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
     -		struct fsck_ref_report report = { .path = NULL };
    -+		struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
    ++		struct fsck_ref_report report = {0};
      
      		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
      		report.path = sb.buf;
2:  714284cf2b ! 2:  7b6f4145cd ref: add regular ref content check for files backend
    @@ Metadata
      ## Commit message ##
         ref: add regular ref content check for files backend
     
    -    We implicitly reply on "git-fsck(1)" to check the consistency of regular
    -    refs. However, when parsing the regular refs for files backend, we allow
    -    the ref content to end with no newline or contain some garbages. We
    -    should warn the user about above situations.
    +    We implicitly rely on "git-fsck(1)" to check the consistency of regular
    +    refs. However, when parsing the regular refs for files backend by using
    +    "files-backend.c::parse_loose_ref_contents", we allow the ref content to
    +    be end with no newline or contain some garbages.
     
    -    In order to provide above functionality, enhance the "git-refs verify"
    -    command by adding consistency check for regular refs for files backend.
    +    It may seem that we should report an error or warn fsck message to the
    +    user about above situations. However, there may be some third-party
    +    tools customizing the content of refs. We should not report an error
    +    fsck message.
     
    -    Add the following three fsck messages to represent the above situations:
    +    And we cannot either report a warn fsck message to the user. This is
    +    because for "git-receive-pack(1)" and "git-fetch-pack(1)", they will
    +    parse the fsck message type and check the message type by
    +    "fsck.c::is_valid_msg_type". Only the fsck infos are not valid. If we
    +    make the fsck message type to be warn, the user could upgrade the fsck
    +    warnings to errors. And the user can also set the "strict" field in
    +    "fsck_options" to upgrade the fsck warnings to errors.
     
    -    1. "badRefContent(ERROR)": A ref has a bad content.
    -    2. "refMissingNewline(WARN)": A valid ref does not end with newline.
    -    3. "trailingRefContent(WARN)": A ref has trailing contents.
    +    We should not allow the user to upgrade the fsck warnings to errors. It
    +    might cause compatibility issue which will break the legacy repository.
    +    So we add the following two fsck infos to represent the situation where
    +    the ref content ends without newline or has garbages:
    +
    +    1. "refMissingNewline(INFO)": A valid ref does not end with newline.
    +    2. "trailingRefContent(INFO)": A ref has trailing contents.
    +
    +    In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
    +    and we can still warn the user about these situations when using
    +    "git-refs verify" without introducing compatibility issue.
    +
    +    In current "git-fsck(1)", it will report an error when the ref content
    +    is bad, so we should following this to report an error to the user when
    +    "parse_loose_ref_contents" fails. And we add a new fsck error message
    +    called "badRefContent(ERROR)" to represent that a ref has a bad content.
     
         In order to tell whether the ref has trailing content, add a new
         parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
    -    function "files_fsck_refs_content" to check the regular refs.
    +    function "files_fsck_refs_content" to check the regular refs to enhance
    +    the "git-refs verify".
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ Documentation/fsck-msgids.txt
      	(WARN) Tree contains entries pointing to a null sha1.
      
     +`refMissingNewline`::
    -+	(WARN) A valid ref does not end with newline.
    ++	(INFO) A valid ref does not end with newline.
     +
     +`trailingRefContent`::
    -+	(WARN) A ref has trailing contents.
    ++	(INFO) A ref has trailing contents.
     +
      `treeNotSorted`::
      	(ERROR) A tree is not properly sorted.
    @@ fsck.h: enum fsck_msg_type {
      	FUNC(BAD_REF_NAME, ERROR) \
      	FUNC(BAD_TIMEZONE, ERROR) \
     @@ fsck.h: enum fsck_msg_type {
    - 	FUNC(HAS_DOTDOT, WARN) \
    - 	FUNC(HAS_DOTGIT, WARN) \
    - 	FUNC(NULL_SHA1, WARN) \
    -+	FUNC(REF_MISSING_NEWLINE, WARN) \
    -+	FUNC(TRAILING_REF_CONTENT, WARN) \
    - 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
    - 	FUNC(NUL_IN_COMMIT, WARN) \
    - 	FUNC(LARGE_PATHNAME, WARN) \
    + 	FUNC(MAILMAP_SYMLINK, INFO) \
    + 	FUNC(BAD_TAG_NAME, INFO) \
    + 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
    ++	FUNC(REF_MISSING_NEWLINE, INFO) \
    ++	FUNC(TRAILING_REF_CONTENT, INFO) \
    + 	/* ignored (elevated when requested) */ \
    + 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
    + 
     
      ## refs.c ##
     @@ refs.c: static int refs_read_special_head(struct ref_store *ref_store,
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +				   const char *refs_check_dir,
     +				   struct dir_iterator *iter)
     +{
    -+	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
     +	struct strbuf ref_content = STRBUF_INIT;
     +	struct strbuf referent = STRBUF_INIT;
     +	struct strbuf refname = STRBUF_INIT;
    ++	struct fsck_ref_report report = {0};
     +	const char *trailing = NULL;
     +	unsigned int type = 0;
     +	int failure_errno = 0;
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +		}
     +
     +		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
    -+					    ref_content.buf, &oid, &referent,
    -+					    &type, &trailing, &failure_errno)) {
    ++					     ref_content.buf, &oid, &referent,
    ++					     &type, &trailing, &failure_errno)) {
     +			ret = fsck_report_ref(o, &report,
     +					      FSCK_MSG_BAD_REF_CONTENT,
     +					      "invalid ref content");
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +				goto cleanup;
     +			}
     +		}
    ++		goto cleanup;
     +	}
     +
     +cleanup:
3:  032b0d6a64 ! 3:  20d8556902 ref: add symbolic ref content check for files backend
    @@ Commit message
         3. "ref: refs/heads/master\n\n"
     
         But we do not allow any non-null trailing garbage. The following are bad
    -    symbolic contents.
    +    symbolic contents which will be reported as fsck error by "git-fsck(1)".
     
         1. "ref: refs/heads/master garbage\n"
         2. "ref: refs/heads/master \n\n\n garbage  "
     
    -    In order to provide above checks, we will traverse the "pointee" to
    -    report the user whether this is null-garbage or no newline. And if
    -    symbolic refs contain non-null garbage, we will report
    -    "FSCK_MSG_BAD_REF_CONTENT" to the user.
    -
    -    Then, we will check the name of the "pointee" is correct by using
    -    "check_refname_format". And then if we can access the "pointee_path" in
    -    the file system, we should ensure that the file type is correct.
    +    In order to provide above checks, we will use "strrchr" to check whether
    +    we have newline in the ref content. Then we will check the name of the
    +    "pointee" is correct by using "check_refname_format". If the function
    +    fails, we need to trim the "pointee" to see whether the null-garbage
    +    causes the function fails. If so, we need to report that there is
    +    null-garbage in the symref content. Otherwise, we should report the user
    +    the "pointee" is bad.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +				    struct strbuf *pointee_name,
     +				    struct strbuf *pointee_path)
     +{
    -+	unsigned int newline_num = 0;
    -+	unsigned int space_num = 0;
    ++	const char *newline_pos = NULL;
     +	const char *p = NULL;
     +	struct stat st;
     +	int ret = 0;
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +		goto out;
     +	}
     +
    -+	while (*p != '\0') {
    -+		if ((space_num || newline_num) && !isspace(*p)) {
    -+			ret = fsck_report_ref(o, report,
    -+					      FSCK_MSG_BAD_REF_CONTENT,
    -+					      "contains non-null garbage");
    -+			goto out;
    -+		}
    -+
    -+		if (*p == '\n') {
    -+			newline_num++;
    -+		} else if (*p == ' ') {
    -+			space_num++;
    -+		}
    -+		p++;
    -+	}
    -+
    -+	if (space_num || newline_num > 1) {
    -+		ret = fsck_report_ref(o, report,
    -+				      FSCK_MSG_TRAILING_REF_CONTENT,
    -+				      "trailing null-garbage");
    -+	} else if (!newline_num) {
    ++	newline_pos = strrchr(p, '\n');
    ++	if (!newline_pos || *(newline_pos + 1)) {
     +		ret = fsck_report_ref(o, report,
     +				      FSCK_MSG_REF_MISSING_NEWLINE,
     +				      "missing newline");
     +	}
     +
    -+	strbuf_rtrim(pointee_name);
    -+
     +	if (check_refname_format(pointee_name->buf, 0)) {
    ++		/*
    ++		 * When containing null-garbage, "check_refname_format" will
    ++		 * fail, we should trim the "pointee" to check again.
    ++		 */
    ++		strbuf_rtrim(pointee_name);
    ++		if (!check_refname_format(pointee_name->buf, 0)) {
    ++			ret = fsck_report_ref(o, report,
    ++					      FSCK_MSG_TRAILING_REF_CONTENT,
    ++					      "trailing null-garbage");
    ++			goto out;
    ++		}
    ++
     +		ret = fsck_report_ref(o, report,
     +				      FSCK_MSG_BAD_SYMREF_POINTEE,
     +				      "points to refname with invalid format");
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
      				   const char *refs_check_dir,
      				   struct dir_iterator *iter)
      {
    - 	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
     +	struct strbuf pointee_path = STRBUF_INIT;
      	struct strbuf ref_content = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
     +						       &referent,
     +						       &pointee_path);
      		}
    + 		goto cleanup;
      	}
    - 
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
      	strbuf_release(&refname);
      	strbuf_release(&ref_content);
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
     +	printf "ref: refs/heads/branch     " > $branch_dir_prefix/a/b/branch-trailing &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    ++	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
     +	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
     +	EOF
     +	rm $branch_dir_prefix/a/b/branch-trailing &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
     +	printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    ++	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
     +	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
     +	EOF
     +	rm $branch_dir_prefix/a/b/branch-trailing &&
4:  147a873958 ! 4:  d9867c5f87 ref: add symlink ref consistency check for files backend
    @@ Metadata
     Author: shejialuo <shejialuo@gmail.com>
     
      ## Commit message ##
    -    ref: add symlink ref consistency check for files backend
    +    ref: add symlink ref check for files backend
     
         We have already introduced "files_fsck_symref_target". We should reuse
         this function to handle the symrefs which are legacy symbolic links. We
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +				    struct strbuf *pointee_path,
     +				    unsigned int symbolic_link)
      {
    - 	unsigned int newline_num = 0;
    - 	unsigned int space_num = 0;
    + 	const char *newline_pos = NULL;
    + 	const char *p = NULL;
     @@ refs/files-backend.c: static int files_fsck_symref_target(struct fsck_options *o,
      		goto out;
      	}
      
    --	while (*p != '\0') {
    --		if ((space_num || newline_num) && !isspace(*p)) {
    --			ret = fsck_report_ref(o, report,
    --					      FSCK_MSG_BAD_REF_CONTENT,
    --					      "contains non-null garbage");
    --			goto out;
    +-	newline_pos = strrchr(p, '\n');
    +-	if (!newline_pos || *(newline_pos + 1)) {
    +-		ret = fsck_report_ref(o, report,
    +-				      FSCK_MSG_REF_MISSING_NEWLINE,
    +-				      "missing newline");
     +	if (!symbolic_link) {
    -+		while (*p != '\0') {
    -+			if ((space_num || newline_num) && !isspace(*p)) {
    -+				ret = fsck_report_ref(o, report,
    -+						      FSCK_MSG_BAD_REF_CONTENT,
    -+						      "contains non-null garbage");
    -+				goto out;
    -+			}
    -+
    -+			if (*p == '\n') {
    -+				newline_num++;
    -+			} else if (*p == ' ') {
    -+				space_num++;
    -+			}
    -+			p++;
    - 		}
    - 
    --		if (*p == '\n') {
    --			newline_num++;
    --		} else if (*p == ' ') {
    --			space_num++;
    -+		if (space_num || newline_num > 1) {
    -+			ret = fsck_report_ref(o, report,
    -+					      FSCK_MSG_TRAILING_REF_CONTENT,
    -+					      "trailing null-garbage");
    -+		} else if (!newline_num) {
    ++		newline_pos = strrchr(p, '\n');
    ++		if (!newline_pos || *(newline_pos + 1)) {
     +			ret = fsck_report_ref(o, report,
     +					      FSCK_MSG_REF_MISSING_NEWLINE,
     +					      "missing newline");
    - 		}
    --		p++;
    --	}
    - 
    --	if (space_num || newline_num > 1) {
    --		ret = fsck_report_ref(o, report,
    --				      FSCK_MSG_TRAILING_REF_CONTENT,
    --				      "trailing null-garbage");
    --	} else if (!newline_num) {
    --		ret = fsck_report_ref(o, report,
    --				      FSCK_MSG_REF_MISSING_NEWLINE,
    --				      "missing newline");
    -+		strbuf_rtrim(pointee_name);
    ++		}
      	}
      
    --	strbuf_rtrim(pointee_name);
    --
      	if (check_refname_format(pointee_name->buf, 0)) {
    +-		/*
    +-		 * When containing null-garbage, "check_refname_format" will
    +-		 * fail, we should trim the "pointee" to check again.
    +-		 */
    +-		strbuf_rtrim(pointee_name);
    +-		if (!check_refname_format(pointee_name->buf, 0)) {
    +-			ret = fsck_report_ref(o, report,
    +-					      FSCK_MSG_TRAILING_REF_CONTENT,
    +-					      "trailing null-garbage");
    +-			goto out;
    ++		if (!symbolic_link) {
    ++			/*
    ++			* When containing null-garbage, "check_refname_format" will
    ++			* fail, we should trim the "pointee" to check again.
    ++			*/
    ++			strbuf_rtrim(pointee_name);
    ++			if (!check_refname_format(pointee_name->buf, 0)) {
    ++				ret = fsck_report_ref(o, report,
    ++						      FSCK_MSG_TRAILING_REF_CONTENT,
    ++						      "trailing null-garbage");
    ++				goto out;
    ++			}
    + 		}
    + 
      		ret = fsck_report_ref(o, report,
    - 				      FSCK_MSG_BAD_SYMREF_POINTEE,
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    - 	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
    + {
      	struct strbuf pointee_path = STRBUF_INIT;
      	struct strbuf ref_content = STRBUF_INIT;
     +	struct strbuf abs_gitdir = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
      	struct strbuf refname = STRBUF_INIT;
    + 	struct fsck_ref_report report = {0};
    ++	const char *pointee_name = NULL;
     +	unsigned int symbolic_link = 0;
      	const char *trailing = NULL;
      	unsigned int type = 0;
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
     -						       &pointee_path);
     +						       &pointee_path,
     +						       symbolic_link);
    -+		}
    -+	} else if (S_ISLNK(iter->st.st_mode)) {
    -+		const char *pointee_name = NULL;
    + 		}
    + 		goto cleanup;
    + 	}
    + 
    ++	symbolic_link = 1;
     +
    -+		symbolic_link = 1;
    ++	strbuf_add_real_path(&pointee_path, iter->path.buf);
    ++	strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
    ++	strbuf_normalize_path(&abs_gitdir);
    ++	if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
    ++		strbuf_addch(&abs_gitdir, '/');
     +
    -+		strbuf_add_real_path(&pointee_path, iter->path.buf);
    -+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
    -+		strbuf_normalize_path(&abs_gitdir);
    -+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
    -+			strbuf_addch(&abs_gitdir, '/');
    ++	if (!skip_prefix(pointee_path.buf, abs_gitdir.buf, &pointee_name)) {
    ++		ret = fsck_report_ref(o, &report,
    ++				      FSCK_MSG_BAD_SYMREF_POINTEE,
    ++				      "point to target outside gitdir");
    ++		goto cleanup;
    ++	}
     +
    -+		if (!skip_prefix(pointee_path.buf,
    -+				 abs_gitdir.buf, &pointee_name)) {
    -+			ret = fsck_report_ref(o, &report,
    -+					       FSCK_MSG_BAD_SYMREF_POINTEE,
    -+					       "point to target outside gitdir");
    -+			goto cleanup;
    - 		}
    ++	strbuf_addstr(&referent, pointee_name);
    ++	ret = files_fsck_symref_target(o, &report, refname.buf,
    ++				       &referent, &pointee_path,
    ++				       symbolic_link);
     +
    -+		strbuf_addstr(&referent, pointee_name);
    -+		ret = files_fsck_symref_target(o, &report, refname.buf,
    -+					       &referent, &pointee_path,
    -+					       symbolic_link);
    - 	}
    - 
      cleanup:
    -@@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    + 	strbuf_release(&refname);
      	strbuf_release(&ref_content);
      	strbuf_release(&referent);
      	strbuf_release(&pointee_path);
-- 
2.46.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
@ 2024-08-27 16:07     ` shejialuo
  2024-08-27 17:49       ` Junio C Hamano
  2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly specifies the ".path" field to initialize
the "fsck_ref_report" structure. However, it introduces confusion how we
initialize the other fields. In order to avoid this, initialize the
"fsck_ref_report" with zero to make clear that everything in
"fsck_ref_report" is zero initialized.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8d6ec9458d..d6fc3bd67c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = {0};
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
  2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-08-27 16:07     ` shejialuo
  2024-08-27 16:19       ` shejialuo
                         ` (2 more replies)
  2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
                       ` (4 subsequent siblings)
  6 siblings, 3 replies; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, when parsing the regular refs for files backend by using
"files-backend.c::parse_loose_ref_contents", we allow the ref content to
be end with no newline or contain some garbages.

It may seem that we should report an error or warn fsck message to the
user about above situations. However, there may be some third-party
tools customizing the content of refs. We should not report an error
fsck message.

And we cannot either report a warn fsck message to the user. This is
because if the caller set the "strict" field in "fsck_options" to
to upgrade the fsck warnings to errors.

We should not allow the user to upgrade the fsck warnings to errors. It
might cause compatibility issue which will break the legacy repository.
So we add the following two fsck infos to represent the situation where
the ref content ends without newline or has garbages:

1. "refMissingNewline(INFO)": A valid ref does not end with newline.
2. "trailingRefContent(INFO)": A ref has trailing contents.

In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
and we can still warn the user about these situations when using
"git-refs verify" without introducing compatibility issue.

In current "git-fsck(1)", it will report an error when the ref content
is bad, so we should following this to report an error to the user when
"parse_loose_ref_contents" fails. And we add a new fsck error message
called "badRefContent(ERROR)" to represent that a ref has a bad content.

In order to tell whether the ref has trailing content, add a new
parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
function "files_fsck_refs_content" to check the regular refs to enhance
the "git-refs verify".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 ++++
 fsck.h                        |  3 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 68 ++++++++++++++++++++++++++-
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 87 +++++++++++++++++++++++++++++++++++
 6 files changed, 167 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..fc074fc571 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has a bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
@@ -170,6 +173,12 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A valid ref does not end with newline.
+
+`trailingRefContent`::
+	(INFO) A ref has trailing contents.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
@@ -84,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d6fc3bd67c..69c00073eb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	const char *trailing = NULL;
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISREG(iter->st.st_mode)) {
+		if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+			ret = error_errno(_("%s/%s: unable to read the ref"),
+					  refs_check_dir, iter->relative_path);
+			goto cleanup;
+		}
+
+		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+					     ref_content.buf, &oid, &referent,
+					     &type, &trailing, &failure_errno)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "invalid ref content");
+			goto cleanup;
+		}
+
+		if (!(type & REF_ISSYMREF)) {
+			if (*trailing == '\0') {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_REF_MISSING_NEWLINE,
+						      "missing newline");
+				goto cleanup;
+			}
+
+			if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_TRAILING_REF_CONTENT,
+						      "trailing garbage in ref");
+				goto cleanup;
+			}
+		}
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3575,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..7c1910d784 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git commit --allow-empty -m second &&
+	git checkout -b branch-2 &&
+	git tag tag-2 &&
+	git checkout -b a/b/tag-2 &&
+
+	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-2-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-2-garbage &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%sx" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-1-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $tag_dir_prefix/tag-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-2-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $branch_dir_prefix/a/b/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
  2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-08-27 16:08     ` shejialuo
  2024-08-27 19:19       ` Junio C Hamano
  2024-08-28 12:50       ` Patrick Steinhardt
  2024-08-27 16:08     ` [PATCH v2 4/4] ref: add symlink ref " shejialuo
                       ` (3 subsequent siblings)
  6 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:08 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symbolic ref points to.
Instead, we just check the content of the symbolic ref itself.

In order to check the content of the symbolic ref, create a function
"files_fsck_symref_target". It will first check whether the "pointee" is
under the "refs/" directory and then we will check the "pointee" itself.

There is no specification about the content of the symbolic ref.
Although we do write "ref: %s\n" to create a symbolic ref by using
"git-symbolic-ref(1)" command. However, this is not mandatory. We still
accept symbolic refs with null trailing garbage. Put it more specific,
the following are correct:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

But we do not allow any non-null trailing garbage. The following are bad
symbolic contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

In order to provide above checks, we will use "strrchr" to check whether
we have newline in the ref content. Then we will check the name of the
"pointee" is correct by using "check_refname_format". If the function
fails, we need to trim the "pointee" to see whether the null-garbage
causes the function fails. If so, we need to report that there is
null-garbage in the symref content. Otherwise, we should report the user
the "pointee" is bad.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 77 +++++++++++++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 54 ++++++++++++++++++++++++
 4 files changed, 135 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index fc074fc571..85fd058c81 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badSymrefPointee`::
+	(ERROR) The pointee of a symref is bad.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..cbe837f84c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_SYMREF_POINTEE, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 69c00073eb..382c73fcf7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,11 +3434,81 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "pointee_name" and "pointee_path". The caller should
+ * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
+ * would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    const char *refname,
+				    struct strbuf *pointee_name,
+				    struct strbuf *pointee_path)
+{
+	const char *newline_pos = NULL;
+	const char *p = NULL;
+	struct stat st;
+	int ret = 0;
+
+	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
+
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	newline_pos = strrchr(p, '\n');
+	if (!newline_pos || *(newline_pos + 1)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+	}
+
+	if (check_refname_format(pointee_name->buf, 0)) {
+		/*
+		 * When containing null-garbage, "check_refname_format" will
+		 * fail, we should trim the "pointee" to check again.
+		 */
+		strbuf_rtrim(pointee_name);
+		if (!check_refname_format(pointee_name->buf, 0)) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "trailing null-garbage");
+			goto out;
+		}
+
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to refname with invalid format");
+	}
+
+	/*
+	 * Missing target should not be treated as any error worthy event and
+	 * not even warn. It is a common case that a symbolic ref points to a
+	 * ref that does not exist yet. If the target ref does not exist, just
+	 * skip the check for the file type.
+	 */
+	if (lstat(pointee_path->buf, &st) < 0)
+		goto out;
+
+	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to an invalid file type");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
+	struct strbuf pointee_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3482,6 +3552,12 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 						      "trailing garbage in ref");
 				goto cleanup;
 			}
+		} else {
+			strbuf_addf(&pointee_path, "%s/%s",
+				    ref_store->gitdir, referent.buf);
+			ret = files_fsck_symref_target(o, &report, refname.buf,
+						       &referent,
+						       &pointee_path);
 		}
 		goto cleanup;
 	}
@@ -3490,6 +3566,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&pointee_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 7c1910d784..69280795ca 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -176,4 +176,58 @@ test_expect_success 'regular ref content should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success 'symbolic ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git checkout -b a/b/branch-2 &&
+
+	printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v2 4/4] ref: add symlink ref check for files backend
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
                       ` (2 preceding siblings ...)
  2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
@ 2024-08-27 16:08     ` shejialuo
  2024-08-28 18:42     ` [PATCH] SQUASH??? remove unused parameters Junio C Hamano
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:08 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which are legacy symbolic links. We
should not check the trailing garbage for symbolic links. Add a new
parameter "symbolic_link" to disable some checks which should only be
used for symbolic ref.

We firstly use the "strbuf_add_real_path" to resolve the symlinks and
get the absolute path "pointee_path" which the symlink ref points to.
Then we can get the absolute path "abs_gitdir" of the "gitdir". By
combining "pointee_path" and "abs_gitdir", we can extract the
"referent". Thus, we can reuse "files_fsck_symref_target" function to
seamlessly check the symlink refs.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     | 68 +++++++++++++++++++++++++++++-----------
 t/t0602-reffiles-fsck.sh | 44 ++++++++++++++++++++++++++
 2 files changed, 94 insertions(+), 18 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 382c73fcf7..8641e3ba65 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../copy.h"
 #include "../environment.h"
 #include "../gettext.h"
@@ -3437,13 +3438,15 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 /*
  * Check the symref "pointee_name" and "pointee_path". The caller should
  * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
- * would be the content after "refs:".
+ * would be the content after "refs:". For symblic link, "pointee_name" would
+ * be the relative path agaignst "gitdir".
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    const char *refname,
 				    struct strbuf *pointee_name,
-				    struct strbuf *pointee_path)
+				    struct strbuf *pointee_path,
+				    unsigned int symbolic_link)
 {
 	const char *newline_pos = NULL;
 	const char *p = NULL;
@@ -3458,24 +3461,28 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	newline_pos = strrchr(p, '\n');
-	if (!newline_pos || *(newline_pos + 1)) {
-		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_REF_MISSING_NEWLINE,
-				      "missing newline");
+	if (!symbolic_link) {
+		newline_pos = strrchr(p, '\n');
+		if (!newline_pos || *(newline_pos + 1)) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "missing newline");
+		}
 	}
 
 	if (check_refname_format(pointee_name->buf, 0)) {
-		/*
-		 * When containing null-garbage, "check_refname_format" will
-		 * fail, we should trim the "pointee" to check again.
-		 */
-		strbuf_rtrim(pointee_name);
-		if (!check_refname_format(pointee_name->buf, 0)) {
-			ret = fsck_report_ref(o, report,
-					      FSCK_MSG_TRAILING_REF_CONTENT,
-					      "trailing null-garbage");
-			goto out;
+		if (!symbolic_link) {
+			/*
+			* When containing null-garbage, "check_refname_format" will
+			* fail, we should trim the "pointee" to check again.
+			*/
+			strbuf_rtrim(pointee_name);
+			if (!check_refname_format(pointee_name->buf, 0)) {
+				ret = fsck_report_ref(o, report,
+						      FSCK_MSG_TRAILING_REF_CONTENT,
+						      "trailing null-garbage");
+				goto out;
+			}
 		}
 
 		ret = fsck_report_ref(o, report,
@@ -3510,9 +3517,12 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 {
 	struct strbuf pointee_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
+	const char *pointee_name = NULL;
+	unsigned int symbolic_link = 0;
 	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
@@ -3557,16 +3567,38 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				    ref_store->gitdir, referent.buf);
 			ret = files_fsck_symref_target(o, &report, refname.buf,
 						       &referent,
-						       &pointee_path);
+						       &pointee_path,
+						       symbolic_link);
 		}
 		goto cleanup;
 	}
 
+	symbolic_link = 1;
+
+	strbuf_add_real_path(&pointee_path, iter->path.buf);
+	strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+	strbuf_normalize_path(&abs_gitdir);
+	if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+		strbuf_addch(&abs_gitdir, '/');
+
+	if (!skip_prefix(pointee_path.buf, abs_gitdir.buf, &pointee_name)) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "point to target outside gitdir");
+		goto cleanup;
+	}
+
+	strbuf_addstr(&referent, pointee_name);
+	ret = files_fsck_symref_target(o, &report, refname.buf,
+				       &referent, &pointee_path,
+				       symbolic_link);
+
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&pointee_path);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 69280795ca..36992fbc7f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -230,4 +230,48 @@ test_expect_success 'symbolic ref content should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success SYMLINKS 'symbolic ref (symbolic link) content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git checkout -b a/b/branch-2 &&
+
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: point to target outside gitdir
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./".branch" $branch_dir_prefix/branch-symbolic &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-08-27 16:19       ` shejialuo
  2024-08-27 18:21       ` Junio C Hamano
  2024-08-28 12:50       ` Patrick Steinhardt
  2 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-27 16:19 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

On Wed, Aug 28, 2024 at 12:08:03AM +0800, shejialuo wrote:

> And we cannot either report a warn fsck message to the user. This is
> because if the caller set the "strict" field in "fsck_options" to
> to upgrade the fsck warnings to errors.
> 

Sorry for this paragraph, because I have changed commit message for this
patch, the range-diff part would be outdated. But the code is still the
same. So I hope this will not cause much trouble. And this paragraph
should be the following:

  And we cannot either report a warn fsck message to the user. This is
  because if the caller set the "strict" field in "fsck_options", fsck
  warnings will be converted to errors.

I will fix this in the next version until I receive enough feedback.

Thanks.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero
  2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-08-27 17:49       ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-27 17:49 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> The original code explicitly specifies the ".path" field to initialize
> the "fsck_ref_report" structure. However, it introduces confusion how we
> initialize the other fields.

The above description is a bit too strong than what this patch is
actually fixing.  If you explicitly initialize any member of an
aggregate type, other members not mentioned will be implicitly
0-initialized, so the original does not give any confusion to
readers who know what they are reading.

What the patch improves is that the common idiom used in this
code base (and possibly elsewhere) is to use "{ 0 }", instead
of explicitly saying "this particular member is 0-initialized".

    The original code explicitly initializes the "path" member in
    the "struct fsck_ref_report" to NULL (which implicitly
    0-initializes other members in the struct).  It is more
    customary to use "{ 0 }" to express that we are 0-initializing
    everything.

The patch is correct, but spelling it like "{ 0 }" with a space on both
sides is more common [*], and because this patch is all about making it
more idiomatic, let's write it that way.

Thanks.

[Footnote]

 * "git grep -e '{0};' -e '{ 0 };' '*.[ch]'" tells us so.


> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8d6ec9458d..d6fc3bd67c 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
>  		goto cleanup;
>  
>  	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> -		struct fsck_ref_report report = { .path = NULL };
> +		struct fsck_ref_report report = {0};
>  
>  		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
>  		report.path = sb.buf;

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
  2024-08-27 16:19       ` shejialuo
@ 2024-08-27 18:21       ` Junio C Hamano
  2024-08-28 12:50         ` Patrick Steinhardt
  2024-08-28 14:31         ` shejialuo
  2024-08-28 12:50       ` Patrick Steinhardt
  2 siblings, 2 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-27 18:21 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, when parsing the regular refs for files backend by using
> "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> be end with no newline or contain some garbages.

"to be end with" -> "to end with".
"or contain" -> "or to contain" (optional, I think).

Or "... the ref content without terminating newline, or with extra
bytes after the terminating newline."

> It may seem that we should report an error or warn fsck message to the
> user about above situations. However, there may be some third-party
> tools customizing the content of refs. We should not report an error
> fsck message.

    Even though we never created such loose refs ourselves, we have
    accepted such loose refs forever, so it is entirely possible
    that third-party tools may rely on such loose refs being valid.
    Let's notice such a "curiously formatted" loose ref files and
    tell the users our findings, so that we can assess the possible
    extent of damage if/when we retroactively tightened the parsing
    rules in the future.

> We should not allow the user to upgrade the fsck warnings to errors. It
> might cause compatibility issue which will break the legacy repository.

I am not sure this is a right thing to say.  If the user wants to
ensure that the tool they use in their repository, which may include
some third-party reimplementation of Git, would never create such a
(semi-)malformed loose ref files, it is within their right, and it
is the most reasonable way, to promote these "curiously formatted
loose ref" fsck warnings to errors.

Is your "We should not allow" above backed by code that prevents
them from promoting the warnings to errors, or is it merely a
declaration of your intention?

> So we add the following two fsck infos to represent the situation where
> the ref content ends without newline or has garbages:
>
> 1. "refMissingNewline(INFO)": A valid ref does not end with newline.
> 2. "trailingRefContent(INFO)": A ref has trailing contents.

OK.

> In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
> and we can still warn the user about these situations when using
> "git-refs verify" without introducing compatibility issue.

OK.

> In current "git-fsck(1)", it will report an error when the ref content
> is bad, so we should following this to report an error to the user when
> "parse_loose_ref_contents" fails. And we add a new fsck error message
> called "badRefContent(ERROR)" to represent that a ref has a bad content.

Good.

> @@ -170,6 +173,12 @@
>  `nullSha1`::
>  	(WARN) Tree contains entries pointing to a null sha1.
>  
> +`refMissingNewline`::
> +	(INFO) A valid ref does not end with newline.
> +
> +`trailingRefContent`::
> +	(INFO) A ref has trailing contents.
> +
>  `treeNotSorted`::
>  	(ERROR) A tree is not properly sorted.

There is no mention of "you shouldn't promote these to error" here,
which is good.  But wouldn't we want to tell users to report such
curiously formatted loose refs, after figuring out who created them,
to help us to eventually make the check stricter in the future?

Git 3.0 boundary might be a good time to tighten interoperability
rules such that we won't accept anything we wouldn't have written
ourselves (not limited to loose ref format, but this applies to
anything on-disk or on-wire), but we'd need enough preparation if we
want to be able to do so in the future.

Thanks.



^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
@ 2024-08-27 19:19       ` Junio C Hamano
  2024-08-28 15:26         ` shejialuo
  2024-08-28 12:50       ` Patrick Steinhardt
  1 sibling, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-27 19:19 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symbolic ref points to.
> Instead, we just check the content of the symbolic ref itself.

Just in case you need it in the future, if you ever need to refer to
a symbolic ref in a way that it is clear which of the two kinds you
are talking about, you can say "textual symref" (a regular file
whose contents is "ref: " followed by the target), to contrast them
with "symbolic link used as symref".

In the proposed log message of this commit, all references to
"symbolic ref" talk about textual ones, so I do not see any need to
be extra explicit by saying "textual symref".

> In order to check the content of the symbolic ref, create a function
> "files_fsck_symref_target". It will first check whether the "pointee" is
> under the "refs/" directory and then we will check the "pointee" itself.

Hmph, as the pointee must be within the usual places that you would
find refs (either in refs/ directory or pseudo ref files immediately
below $GIT_DIR), wouldn't we check the pointee when fsck (or "git
refs verify") run and check everything?  The pointee will have its
turn to be checked, and I am not sure why you need to check the
pointee when you find a symbolic ref is pointing at it, which will
lead for it to be checked twice (or more).

I however did not find an additional code to "check the pointee itself"
in the patch, so perhaps it is OK---the only thing that needs fixing
may be the above paragraph if that is the case.

> There is no specification about the content of the symbolic ref.
> Although we do write "ref: %s\n" to create a symbolic ref by using
> "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> accept symbolic refs with null trailing garbage. Put it more specific,
> the following are correct:
>
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
>
> But we do not allow any non-null trailing garbage.

Your use of word "null" is probably too confusing to contributors to
this project.  None of the above has NUL bytes in them.  I think you
want to say something like this:

    A regular file is accepted as a textual symbolic ref if it
    begins with "ref:", followed by zero or more whitespaces,
    followed by the full refname (e.g. "refs/heads/master",
    "refs/tags/v1.0"), followed only by whitespace characters.  We
    always write a single SP after "ref:" and a single LF after the
    full refname, but third-party reimplementations of Git may have
    taken advantage of the looser syntax that is allowed as above.

> The following are bad
> symbolic contents which will be reported as fsck error by "git-fsck(1)".
>
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
>
> In order to provide above checks, we will use "strrchr" to check whether
> we have newline in the ref content.

strrchr() to look for only LF is overly strict.  You need to match
what refs/files-backend.c:read_ref_internal() does to the contents
read from such a loose ref file, i.e. strbuf_rtrim().  Any isspace()
bytes are trimmed at the end, including SP, HT, CR and LF.

> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    const char *refname,
> +				    struct strbuf *pointee_name,
> +				    struct strbuf *pointee_path)
> +{
> +	const char *newline_pos = NULL;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> +
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	newline_pos = strrchr(p, '\n');
> +	if (!newline_pos || *(newline_pos + 1)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");

If newline_pos is NULL, it is truly a "missing newline" situation.
If I am reading the code correctly, the severity level is set to
INFO, which is good.

If newline_pos is not NULL but newline_pos[1] is not NUL, however,
that is not a "missing newline".  "refs: refs/heads/master\n " would
trigger this report, for example.

As far as I can tell, such a textual symbolic ref is taken as a
valid symbolic ref pointing at "refs/heads/master" by
refs/files-backend.c:read_ref_internal(), so we are trying to detect
a valid but curiously formatted textual symbolic ref file with the
above code?

And strrchr() to find the last LF is not sufficient for that
purpose.  We would never write "refs:  refs/head/master \n",
but the above code will find the LF, be satisified that the LF is
followed by NUL, without realizing that SP there is not something we
would have written!

I am not sure if that is worth detecting that if it is something we
would have written, but if that were the case, then you would
probably need to do

    (1) check the last byte of pointee_name.buf[] to make sure that
        it is LF; and
    (2) remember pointee_name.len, run strbuf_rtrim() on pointee_name,
        and that LF at the end was the only thing that was trimmed by
        checking the pointee_name.len after trimming.

or something like that.  Then you do not have to have an ugly "oh we
need to check again"---the production code would not do that, either.

> +	if (check_refname_format(pointee_name->buf, 0)) {
> +		/*
> +		 * When containing null-garbage, "check_refname_format" will
> +		 * fail, we should trim the "pointee" to check again.
> +		 */
> +		strbuf_rtrim(pointee_name);
> +		if (!check_refname_format(pointee_name->buf, 0)) {
> +			ret = fsck_report_ref(o, report,
> +					      FSCK_MSG_TRAILING_REF_CONTENT,
> +					      "trailing null-garbage");
> +			goto out;
> +		}

IOW, the above "let's retry" feels totally wrong.  You shouldn't
have to do so, and that comes from running check_refname_format()
before rtrimming the pointee_name.

> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to refname with invalid format");
> +	}

Good.  With this check, we know that the referent, if exists, is
well-formed.  The contents of the referent will then be checked just
like all other refs that may not be pointed by any symbolic ref.

> +	/*
> +	 * Missing target should not be treated as any error worthy event and
> +	 * not even warn. It is a common case that a symbolic ref points to a
> +	 * ref that does not exist yet. If the target ref does not exist, just
> +	 * skip the check for the file type.
> +	 */
> +	if (lstat(pointee_path->buf, &st) < 0)
> +		goto out;

Good.

> +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to an invalid file type");
> +		goto out;

I do not think it is wrong per se, but I am not sure if this check
is needed, either.  When "git fsck" or "git refs verify" is told to
check the loose refs, wouldn't it walk the refs directory and report
such an unusual filesystem entity that is not a regular file,
symbolic link, or a directory as "there is unusual cruft exist
here"?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
  2024-08-27 16:19       ` shejialuo
  2024-08-27 18:21       ` Junio C Hamano
@ 2024-08-28 12:50       ` Patrick Steinhardt
  2024-08-28 14:41         ` shejialuo
  2024-08-28 15:30         ` Junio C Hamano
  2 siblings, 2 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-28 12:50 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
> @@ -170,6 +173,12 @@
>  `nullSha1`::
>  	(WARN) Tree contains entries pointing to a null sha1.
>  
> +`refMissingNewline`::
> +	(INFO) A valid ref does not end with newline.

This reads a bit funny to me. If the ref is valid, why do we complain?

Maybe this would read better if you said "An otherwise valid ref does
not end with a newline".

> @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISREG(iter->st.st_mode)) {

This is still indenting the whole body. You mentioned that you don't
want to use `goto`, but in our codebase it's actually quite idiomatic.
And you already use it anyway.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 71a4d1a5ae..7c1910d784 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
>  	test_must_be_empty err
>  '
>  
> +test_expect_success 'regular ref content should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	branch_dir_prefix=.git/refs/heads &&
> +	tag_dir_prefix=.git/refs/tags &&
> +	cd repo &&
> +	git commit --allow-empty -m initial &&
> +	git checkout -b branch-1 &&
> +	git tag tag-1 &&
> +	git commit --allow-empty -m second &&
> +	git checkout -b branch-2 &&
> +	git tag tag-2 &&
> +	git checkout -b a/b/tag-2 &&

Wouldn't it be sufficient to only create a single commit, e.g. via
`test_commit`? From all I can see all you need is some object ID, so
creating the tags and second commit doesn't seem to be necessary.

> +	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&

We don't typically have spaces after the redirect. So you should remove
them here and in all the subsequent instances.

> +	git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
> +	EOF
> +	rm $branch_dir_prefix/branch-1-no-newline &&
> +	test_cmp expect err &&

I was wondering whether each of these cases should be a separate test,
but that may be a bit wasteful. Alternatively, can we maybe set up a
single repository with all the garbage that we want to verify and then
double check that executing `git refs verify` surfaces them all in a
single invocation?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 18:21       ` Junio C Hamano
@ 2024-08-28 12:50         ` Patrick Steinhardt
  2024-08-28 16:32           ` Junio C Hamano
  2024-08-28 14:31         ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-28 12:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git, Karthik Nayak

On Tue, Aug 27, 2024 at 11:21:34AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> > +
> > +`trailingRefContent`::
> > +	(INFO) A ref has trailing contents.
> > +
> >  `treeNotSorted`::
> >  	(ERROR) A tree is not properly sorted.
> 
> There is no mention of "you shouldn't promote these to error" here,
> which is good.  But wouldn't we want to tell users to report such
> curiously formatted loose refs, after figuring out who created them,
> to help us to eventually make the check stricter in the future?
> 
> Git 3.0 boundary might be a good time to tighten interoperability
> rules such that we won't accept anything we wouldn't have written
> ourselves (not limited to loose ref format, but this applies to
> anything on-disk or on-wire), but we'd need enough preparation if we
> want to be able to do so in the future.

I quite like this idea. Jialuo, would you maybe want to include another
patch on top that adds a paragraph to Documentation/BreakingChanges.txt?
It should note that this is not yet settled and depends on whether or
not we see complaints with your new checks.

I guess another prereq for the change is to integrate `git refs verify`
with git-fsck(1), because otherwise people likely wouldn't see the new
messages in the first place.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
  2024-08-27 19:19       ` Junio C Hamano
@ 2024-08-28 12:50       ` Patrick Steinhardt
  2024-08-28 15:36         ` shejialuo
  2024-08-28 15:41         ` Junio C Hamano
  1 sibling, 2 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-28 12:50 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Aug 28, 2024 at 12:08:07AM +0800, shejialuo wrote:
> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symbolic ref points to.
> Instead, we just check the content of the symbolic ref itself.
> 
> In order to check the content of the symbolic ref, create a function
> "files_fsck_symref_target". It will first check whether the "pointee" is
> under the "refs/" directory and then we will check the "pointee" itself.
> 
> There is no specification about the content of the symbolic ref.
> Although we do write "ref: %s\n" to create a symbolic ref by using
> "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> accept symbolic refs with null trailing garbage. Put it more specific,
> the following are correct:
> 
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"

Now that we're talking about tightening the rules for direct refs, I
wonder whether we'd also want to apply the same rules to symrefs.
Namely, when there is trailing whitespace we should generate an
INFO-level message about that, too. This is mostly for the sake of
consistency.

[snip]
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index fc074fc571..85fd058c81 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,9 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badSymrefPointee`::
> +	(ERROR) The pointee of a symref is bad.

I think we'd want to clarify what "bad" is supposed to mean. Like, is a
missing symref pointee bad? If this is about the format of the pointee's
name, we might want to call this "badSymrefPointeeName".

Also, I think we don't typically call the value of a symbolic ref
"pointee", but "target". Searching for "pointee" in our codebase only
gives a single hit, and that one is not related to symbolic refs.

> diff --git a/fsck.h b/fsck.h
> index b85072df57..cbe837f84c 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>  	FUNC(BAD_REF_CONTENT, ERROR) \
>  	FUNC(BAD_REF_FILETYPE, ERROR) \
>  	FUNC(BAD_REF_NAME, ERROR) \
> +	FUNC(BAD_SYMREF_POINTEE, ERROR) \
>  	FUNC(BAD_TIMEZONE, ERROR) \
>  	FUNC(BAD_TREE, ERROR) \
>  	FUNC(BAD_TREE_SHA1, ERROR) \
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 69c00073eb..382c73fcf7 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3434,11 +3434,81 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +/*
> + * Check the symref "pointee_name" and "pointee_path". The caller should
> + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
> + * would be the content after "refs:".
> + */
> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    const char *refname,
> +				    struct strbuf *pointee_name,
> +				    struct strbuf *pointee_path)
> +{
> +	const char *newline_pos = NULL;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> +
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	newline_pos = strrchr(p, '\n');
> +	if (!newline_pos || *(newline_pos + 1)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");
> +	}

The second condition `*(newline_pos + 1)` checks whether there is any
data after the newline, doesn't it? That indicates a different kind of
error than "missing newline", namely that there is trailing garbage. I
guess we'd want to report a separate info-level message for this.

Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're
only checking for trailing garbage after the _last_ newline, not after
the first one.

> +	if (check_refname_format(pointee_name->buf, 0)) {
> +		/*
> +		 * When containing null-garbage, "check_refname_format" will
> +		 * fail, we should trim the "pointee" to check again.
> +		 */
> +		strbuf_rtrim(pointee_name);
> +		if (!check_refname_format(pointee_name->buf, 0)) {
> +			ret = fsck_report_ref(o, report,
> +					      FSCK_MSG_TRAILING_REF_CONTENT,
> +					      "trailing null-garbage");
> +			goto out;
> +		}

Ah, I didn't get at first that we're doing the check a second time here.
As mentioned above, I think we should check for trailing garbage further
up already and more explicitly.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 7c1910d784..69280795ca 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -176,4 +176,58 @@ test_expect_success 'regular ref content should be checked' '
>  	test_cmp expect err
>  '
>  
> +test_expect_success 'symbolic ref content should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	branch_dir_prefix=.git/refs/heads &&
> +	tag_dir_prefix=.git/refs/tags &&
> +	cd repo &&
> +	git commit --allow-empty -m initial &&
> +	git checkout -b branch-1 &&
> +	git tag tag-1 &&
> +	git checkout -b a/b/branch-2 &&
> +
> +	printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline &&
> +	git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
> +	EOF
> +	rm $branch_dir_prefix/branch-1-no-newline &&
> +	test_cmp expect err &&

Same comments here as in the preceding patch for the tests.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-27 18:21       ` Junio C Hamano
  2024-08-28 12:50         ` Patrick Steinhardt
@ 2024-08-28 14:31         ` shejialuo
  2024-08-28 16:45           ` Junio C Hamano
  1 sibling, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-28 14:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Tue, Aug 27, 2024 at 11:21:34AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly rely on "git-fsck(1)" to check the consistency of regular
> > refs. However, when parsing the regular refs for files backend by using
> > "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> > be end with no newline or contain some garbages.
> 
> "to be end with" -> "to end with".
> "or contain" -> "or to contain" (optional, I think).
> 
> Or "... the ref content without terminating newline, or with extra
> bytes after the terminating newline."
> 

Thanks, I will fix this in the next version.

> > It may seem that we should report an error or warn fsck message to the
> > user about above situations. However, there may be some third-party
> > tools customizing the content of refs. We should not report an error
> > fsck message.
> 
> Even though we never created such loose refs ourselves, we have
> accepted such loose refs forever, so it is entirely possible
> that third-party tools may rely on such loose refs being valid.
> Let's notice such a "curiously formatted" loose ref files and
> tell the users our findings, so that we can assess the possible
> extent of damage if/when we retroactively tightened the parsing
> rules in the future.
> 

I think I could organize the above to the commit message to better show
the motivation why we should not report an error fsck message.

> > We should not allow the user to upgrade the fsck warnings to errors. It
> > might cause compatibility issue which will break the legacy repository.
> 
> I am not sure this is a right thing to say.  If the user wants to
> ensure that the tool they use in their repository, which may include
> some third-party reimplementation of Git, would never create such a
> (semi-)malformed loose ref files, it is within their right, and it
> is the most reasonable way, to promote these "curiously formatted
> loose ref" fsck warnings to errors.
> 
> Is your "We should not allow" above backed by code that prevents
> them from promoting the warnings to errors, or is it merely a
> declaration of your intention?
> 

I have introduced some misunderstanding here. In the previous paragraph,
I have mentioned that if the caller set the "strict" field in
"fsck_options", the fsck warns would be automatically converted to fsck
errors which may cause some trouble.

So I think here we should move this paragraph just after the previous
paragraph to indicate why we do want to make a info fsck message here.
Actually, the user could still explicitly use the following command

  git -c fsck.refMissingNewline=error refs verify

to upgrade the fsck info to fsck error. But if the user use "--strict"
like the following:

  git refs verify --strict

The fsck warns would be automatically converted to fsck errors. But
actually at current, we do not want to the user implicitly upgrade fsck
warns to fsck errors by using "--strict" flag. That's why we need to
introduce the "FSCK_INO" here.

Actually, I was inspired by the Jeff King's commit:

  4dd3b045f5 (fsck: downgrade tree badFilemode to "info", 2022-08-10)

In this commit, Jeff downgrades badFilemode to "info" to avoid above
situation. I will improve the commit message to make things clearer.

However, from my perspective, the semantic of "FSCK_INFO" is a little
unsuitable here. The comment says:

  /* infos (reported as warnings, but ignored by default) */

The "ignored by default" here is very confusing. Actually, we make the
"info" lower than the "warn" to avoid automatically converting the "warn"
to "error" by setting "strict" field in "fsck_options".

But "ignored by default" will make the user think "oh, it's info, but we
report it as warnings". We cannot know the real intention of the
"FSCK_INFO" unless we have above context.

But I guess this is too far from the intention of this patch. We may
improve this later.

> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> > +
> > +`trailingRefContent`::
> > +	(INFO) A ref has trailing contents.
> > +
> >  `treeNotSorted`::
> >  	(ERROR) A tree is not properly sorted.
> 
> There is no mention of "you shouldn't promote these to error" here,
> which is good.  But wouldn't we want to tell users to report such
> curiously formatted loose refs, after figuring out who created them,
> to help us to eventually make the check stricter in the future?
> 

From the review from the Patrick, I will add another patch in the
"Documentation/BreakingChanges.txt" later.

> Git 3.0 boundary might be a good time to tighten interoperability
> rules such that we won't accept anything we wouldn't have written
> ourselves (not limited to loose ref format, but this applies to
> anything on-disk or on-wire), but we'd need enough preparation if we
> want to be able to do so in the future.
> 
> Thanks.
> 

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-28 12:50       ` Patrick Steinhardt
@ 2024-08-28 14:41         ` shejialuo
  2024-08-28 15:30         ` Junio C Hamano
  1 sibling, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-28 14:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Aug 28, 2024 at 02:50:01PM +0200, Patrick Steinhardt wrote:
> On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> 
> This reads a bit funny to me. If the ref is valid, why do we complain?
> 
> Maybe this would read better if you said "An otherwise valid ref does
> not end with a newline".
> 

I think we should just drop the "valid" here. Because for symref, it
may miss newline and is NOT valid.

I will improve this in the next version.

> > @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
> >  				  const char *refs_check_dir,
> >  				  struct dir_iterator *iter);
> >  
> > +static int files_fsck_refs_content(struct ref_store *ref_store,
> > +				   struct fsck_options *o,
> > +				   const char *refs_check_dir,
> > +				   struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> > +	struct strbuf referent = STRBUF_INIT;
> > +	struct strbuf refname = STRBUF_INIT;
> > +	struct fsck_ref_report report = {0};
> > +	const char *trailing = NULL;
> > +	unsigned int type = 0;
> > +	int failure_errno = 0;
> > +	struct object_id oid;
> > +	int ret = 0;
> > +
> > +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> > +	report.path = refname.buf;
> > +
> > +	if (S_ISREG(iter->st.st_mode)) {
> 
> This is still indenting the whole body. You mentioned that you don't
> want to use `goto`, but in our codebase it's actually quite idiomatic.
> And you already use it anyway.
> 

I think so, indenting is noisy. Will use "goto" to avoid indenting.

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 71a4d1a5ae..7c1910d784 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
> >  	test_must_be_empty err
> >  '
> >  
> > +test_expect_success 'regular ref content should be checked' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	branch_dir_prefix=.git/refs/heads &&
> > +	tag_dir_prefix=.git/refs/tags &&
> > +	cd repo &&
> > +	git commit --allow-empty -m initial &&
> > +	git checkout -b branch-1 &&
> > +	git tag tag-1 &&
> > +	git commit --allow-empty -m second &&
> > +	git checkout -b branch-2 &&
> > +	git tag tag-2 &&
> > +	git checkout -b a/b/tag-2 &&
> 
> Wouldn't it be sufficient to only create a single commit, e.g. via
> `test_commit`? From all I can see all you need is some object ID, so
> creating the tags and second commit doesn't seem to be necessary.
> 

I agree with this. I will clean the code for the next version.

> > +	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
> 
> We don't typically have spaces after the redirect. So you should remove
> them here and in all the subsequent instances.
> 

I will clean the code style here.

> > +	git refs verify 2>err &&
> > +	cat >expect <<-EOF &&
> > +	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
> > +	EOF
> > +	rm $branch_dir_prefix/branch-1-no-newline &&
> > +	test_cmp expect err &&
> 
> I was wondering whether each of these cases should be a separate test,
> but that may be a bit wasteful. Alternatively, can we maybe set up a
> single repository with all the garbage that we want to verify and then
> double check that executing `git refs verify` surfaces them all in a
> single invocation?
> 

Actually, I have also thought about separating the tests which may
clear and I dropped this idea due to the reason the same as yours. I DO
agree that we should set up a single repository with all the garbage
that we want to verify. This is necessary.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-27 19:19       ` Junio C Hamano
@ 2024-08-28 15:26         ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-28 15:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Tue, Aug 27, 2024 at 12:19:11PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > In order to check the content of the symbolic ref, create a function
> > "files_fsck_symref_target". It will first check whether the "pointee" is
> > under the "refs/" directory and then we will check the "pointee" itself.
> 
> Hmph, as the pointee must be within the usual places that you would
> find refs (either in refs/ directory or pseudo ref files immediately
> below $GIT_DIR), wouldn't we check the pointee when fsck (or "git
> refs verify") run and check everything?  The pointee will have its
> turn to be checked, and I am not sure why you need to check the
> pointee when you find a symbolic ref is pointing at it, which will
> lead for it to be checked twice (or more).
> 
> I however did not find an additional code to "check the pointee itself"
> in the patch, so perhaps it is OK---the only thing that needs fixing
> may be the above paragraph if that is the case.
> 

Yes, "we will check the 'pointee'" itself makes the reader confused. I
will fix the above paragraph. Actually we do not check the "pointee",
but check the symref content. Will fix this in the next version.

> > There is no specification about the content of the symbolic ref.
> > Although we do write "ref: %s\n" to create a symbolic ref by using
> > "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> > accept symbolic refs with null trailing garbage. Put it more specific,
> > the following are correct:
> >
> > 1. "ref: refs/heads/master   "
> > 2. "ref: refs/heads/master   \n  \n"
> > 3. "ref: refs/heads/master\n\n"
> >
> > But we do not allow any non-null trailing garbage.
> 
> Your use of word "null" is probably too confusing to contributors to
> this project.  None of the above has NUL bytes in them.  I think you
> want to say something like this:
> 
>     A regular file is accepted as a textual symbolic ref if it
>     begins with "ref:", followed by zero or more whitespaces,
>     followed by the full refname (e.g. "refs/heads/master",
>     "refs/tags/v1.0"), followed only by whitespace characters.  We
>     always write a single SP after "ref:" and a single LF after the
>     full refname, but third-party reimplementations of Git may have
>     taken advantage of the looser syntax that is allowed as above.
> 

Thanks for your suggestion. I will improve this in the next version.

> > The following are bad
> > symbolic contents which will be reported as fsck error by "git-fsck(1)".
> >
> > 1. "ref: refs/heads/master garbage\n"
> > 2. "ref: refs/heads/master \n\n\n garbage  "
> >
> > In order to provide above checks, we will use "strrchr" to check whether
> > we have newline in the ref content.
> 
> strrchr() to look for only LF is overly strict.  You need to match
> what refs/files-backend.c:read_ref_internal() does to the contents
> read from such a loose ref file, i.e. strbuf_rtrim().  Any isspace()
> bytes are trimmed at the end, including SP, HT, CR and LF.
> 

I will look into how "strbuf_rtrim" does to see whether we can reuse
some functions to avoid repetition.

> > +static int files_fsck_symref_target(struct fsck_options *o,
> > +				    struct fsck_ref_report *report,
> > +				    const char *refname,
> > +				    struct strbuf *pointee_name,
> > +				    struct strbuf *pointee_path)
> > +{
> > +	const char *newline_pos = NULL;
> > +	const char *p = NULL;
> > +	struct stat st;
> > +	int ret = 0;
> > +
> > +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> > +
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to ref outside the refs directory");
> > +		goto out;
> > +	}
> > +
> > +	newline_pos = strrchr(p, '\n');
> > +	if (!newline_pos || *(newline_pos + 1)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_REF_MISSING_NEWLINE,
> > +				      "missing newline");
> 
> If newline_pos is NULL, it is truly a "missing newline" situation.
> If I am reading the code correctly, the severity level is set to
> INFO, which is good.
> 
> If newline_pos is not NULL but newline_pos[1] is not NUL, however,
> that is not a "missing newline".  "refs: refs/heads/master\n " would
> trigger this report, for example.
> 

When I design this, I actually consider "ref: refs/heads/master\n " is
still missing the newline. And then we also report that it has garbage.
I think "ref: refs/heads/master\n \n" is not missing the newline. But, I
don't think this is good.

I will find a good way to handle this.

> As far as I can tell, such a textual symbolic ref is taken as a
> valid symbolic ref pointing at "refs/heads/master" by
> refs/files-backend.c:read_ref_internal(), so we are trying to detect
> a valid but curiously formatted textual symbolic ref file with the
> above code?

Yes, these situations will be taken as a valid symbolic ref but actually
there are something wrong. So this is what we need to care about.

> 
> And strrchr() to find the last LF is not sufficient for that
> purpose.  We would never write "refs:  refs/head/master \n",
> but the above code will find the LF, be satisified that the LF is
> followed by NUL, without realizing that SP there is not something we
> would have written!

I totally ignored this situation, and in current patch, we cannot check
this. I know why Patrick lets me use "strchr" but not "strrchr". I think
we should find the last '\n'. But instead we need to find the first
'\n'. However, in this example, we will still fail by using "strchr".
This part should be totally re-designed.

> 
> I am not sure if that is worth detecting that if it is something we
> would have written, but if that were the case, then you would
> probably need to do
> 
>     (1) check the last byte of pointee_name.buf[] to make sure that
>         it is LF; and
>     (2) remember pointee_name.len, run strbuf_rtrim() on pointee_name,
>         and that LF at the end was the only thing that was trimmed by
>         checking the pointee_name.len after trimming.
> 
> or something like that.  Then you do not have to have an ugly "oh we
> need to check again"---the production code would not do that, either.
> 

Yes, this is a good idea.

> > +	if (check_refname_format(pointee_name->buf, 0)) {
> > +		/*
> > +		 * When containing null-garbage, "check_refname_format" will
> > +		 * fail, we should trim the "pointee" to check again.
> > +		 */
> > +		strbuf_rtrim(pointee_name);
> > +		if (!check_refname_format(pointee_name->buf, 0)) {
> > +			ret = fsck_report_ref(o, report,
> > +					      FSCK_MSG_TRAILING_REF_CONTENT,
> > +					      "trailing null-garbage");
> > +			goto out;
> > +		}
> 
> IOW, the above "let's retry" feels totally wrong.  You shouldn't
> have to do so, and that comes from running check_refname_format()
> before rtrimming the pointee_name.
> 

Yes, actually, I have thought I could compare the length change after
executing the "strbuf_rtrim". I don't want to create two new variables,
so I call "check_refname_format" twice.

Will fix this in the next version.

> > +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to an invalid file type");
> > +		goto out;
> 
> I do not think it is wrong per se, but I am not sure if this check
> is needed, either.  When "git fsck" or "git refs verify" is told to
> check the loose refs, wouldn't it walk the refs directory and report
> such an unusual filesystem entity that is not a regular file,
> symbolic link, or a directory as "there is unusual cruft exist
> here"?

When setting up the infrastructure, actually we DO report filesystem
entity that is not a regular file or symbolic link like the following:

    if (S_ISDIR(iter->st.st_mode)) {
        continue;
    } else if (S_ISREG(iter->st.st_mode) ||
               S_ISLNK(iter->st.st_mode)) {
        ...;
    } else {
      // report file system error
    }

We do not check the directory, because the directory will be always
valid in the filesystem. we could not say that

  "refs/heads/a/" is a bad ref.

So, this check mainly need to check whether the symref points to a
directory. Actually, Patrick has also gave the review about this
question in the previous version:

> What exactly are we guarding against here? Don't we already verify that
> files in `refs/` have the correct type? Or are we checking that it does
> not point to a directory?

However, we should remove this line, because "check_refname_format" will
take care for us.

  git check-ref-format 'refs/heads/'

It will generate an error. So, we could entirely remove this line and
let "check_refname_format" do this. And we could also remove the

    if (lstat(pointee_path->buf, &st) < 0)
        goto out;

The code will be much more clean.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-28 12:50       ` Patrick Steinhardt
  2024-08-28 14:41         ` shejialuo
@ 2024-08-28 15:30         ` Junio C Hamano
  1 sibling, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 15:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
>> @@ -170,6 +173,12 @@
>>  `nullSha1`::
>>  	(WARN) Tree contains entries pointing to a null sha1.
>>  
>> +`refMissingNewline`::
>> +	(INFO) A valid ref does not end with newline.
>
> This reads a bit funny to me. If the ref is valid, why do we complain?

I think you understood after reading the series through and
responded to my "curiously formatted" comment.  I understand that
these marked as INFO are not about "to complain" but are for us to
ask the user to report so that we can learn of any third-party tools
that may get in our way to later tighten the parsing rules
retroactively.  

> Maybe this would read better if you said "An otherwise valid ref does
> not end with a newline".

So I do agree that the text above is less than optimal.  It is "this
is valid, but something we wouldn't have written.  Who creates such
a ref?"

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-28 12:50       ` Patrick Steinhardt
@ 2024-08-28 15:36         ` shejialuo
  2024-08-28 15:41         ` Junio C Hamano
  1 sibling, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-08-28 15:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Aug 28, 2024 at 02:50:09PM +0200, Patrick Steinhardt wrote:
> On Wed, Aug 28, 2024 at 12:08:07AM +0800, shejialuo wrote:
> > We have already introduced the checks for regular refs. There is no need
> > to check the consistency of the target which the symbolic ref points to.
> > Instead, we just check the content of the symbolic ref itself.
> > 
> > In order to check the content of the symbolic ref, create a function
> > "files_fsck_symref_target". It will first check whether the "pointee" is
> > under the "refs/" directory and then we will check the "pointee" itself.
> > 
> > There is no specification about the content of the symbolic ref.
> > Although we do write "ref: %s\n" to create a symbolic ref by using
> > "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> > accept symbolic refs with null trailing garbage. Put it more specific,
> > the following are correct:
> > 
> > 1. "ref: refs/heads/master   "
> > 2. "ref: refs/heads/master   \n  \n"
> > 3. "ref: refs/heads/master\n\n"
> 
> Now that we're talking about tightening the rules for direct refs, I
> wonder whether we'd also want to apply the same rules to symrefs.
> Namely, when there is trailing whitespace we should generate an
> INFO-level message about that, too. This is mostly for the sake of
> consistency.
> 

Yes, actually this patch does this. I think I need to mention we reuse
the "FSCK_INFO" message id defined in the [PATCH v2 2/4].

> [snip]
> > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > index fc074fc571..85fd058c81 100644
> > --- a/Documentation/fsck-msgids.txt
> > +++ b/Documentation/fsck-msgids.txt
> > @@ -28,6 +28,9 @@
> >  `badRefName`::
> >  	(ERROR) A ref has an invalid format.
> >  
> > +`badSymrefPointee`::
> > +	(ERROR) The pointee of a symref is bad.
> 
> I think we'd want to clarify what "bad" is supposed to mean. Like, is a
> missing symref pointee bad? If this is about the format of the pointee's
> name, we might want to call this "badSymrefPointeeName".
> 

I agree, bad is too general here, we need to make it concrete.

> Also, I think we don't typically call the value of a symbolic ref
> "pointee", but "target". Searching for "pointee" in our codebase only
> gives a single hit, and that one is not related to symbolic refs.
> 

Thanks, I will fix this in the next version.

> > +/*
> > + * Check the symref "pointee_name" and "pointee_path". The caller should
> > + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
> > + * would be the content after "refs:".
> > + */
> > +static int files_fsck_symref_target(struct fsck_options *o,
> > +				    struct fsck_ref_report *report,
> > +				    const char *refname,
> > +				    struct strbuf *pointee_name,
> > +				    struct strbuf *pointee_path)
> > +{
> > +	const char *newline_pos = NULL;
> > +	const char *p = NULL;
> > +	struct stat st;
> > +	int ret = 0;
> > +
> > +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> > +
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to ref outside the refs directory");
> > +		goto out;
> > +	}
> > +
> > +	newline_pos = strrchr(p, '\n');
> > +	if (!newline_pos || *(newline_pos + 1)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_REF_MISSING_NEWLINE,
> > +				      "missing newline");
> > +	}
> 
> The second condition `*(newline_pos + 1)` checks whether there is any
> data after the newline, doesn't it? That indicates a different kind of
> error than "missing newline", namely that there is trailing garbage. I
> guess we'd want to report a separate info-level message for this.
> 
> Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're
> only checking for trailing garbage after the _last_ newline, not after
> the first one.
> 

Yes, I totally made a mistake here. I will try to think about a new
design. I have already replied to Junio like the following:

> > And strrchr() to find the last LF is not sufficient for that
> > purpose.  We would never write "refs:  refs/head/master \n",
> > but the above code will find the LF, be satisified that the LF is
> > followed by NUL, without realizing that SP there is not something we
> > would have written!

> I totally ignored this situation, and in current patch, we cannot check
> this. I know why Patrick lets me use "strchr" but not "strrchr". I think
> we should find the last '\n'. But instead we need to find the first
> '\n'. However, in this example, we will still fail by using "strchr".
> This part should be totally re-designed.

> > +	if (check_refname_format(pointee_name->buf, 0)) {
> > +		/*
> > +		 * When containing null-garbage, "check_refname_format" will
> > +		 * fail, we should trim the "pointee" to check again.
> > +		 */
> > +		strbuf_rtrim(pointee_name);
> > +		if (!check_refname_format(pointee_name->buf, 0)) {
> > +			ret = fsck_report_ref(o, report,
> > +					      FSCK_MSG_TRAILING_REF_CONTENT,
> > +					      "trailing null-garbage");
> > +			goto out;
> > +		}
> 
> Ah, I didn't get at first that we're doing the check a second time here.
> As mentioned above, I think we should check for trailing garbage further
> up already and more explicitly.
> 

Well, I guess the implementation about this is totally wrong, which will
make the reviewers hard to understand. I will drop this way to
explicitly check the garbage.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-28 12:50       ` Patrick Steinhardt
  2024-08-28 15:36         ` shejialuo
@ 2024-08-28 15:41         ` Junio C Hamano
  2024-08-29 10:11           ` Patrick Steinhardt
  1 sibling, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 15:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

> Also, I think we don't typically call the value of a symbolic ref
> "pointee", but "target". Searching for "pointee" in our codebase only
> gives a single hit, and that one is not related to symbolic refs.

Yesterday while I was studying for reviewing this series, I saw some
existing code that call them "referent".  There may also be "target".

>> +	if (!newline_pos || *(newline_pos + 1)) {
>> +		ret = fsck_report_ref(o, report,
>> +				      FSCK_MSG_REF_MISSING_NEWLINE,
>> +				      "missing newline");
>> +	}
>
> The second condition `*(newline_pos + 1)` checks whether there is any
> data after the newline, doesn't it? That indicates a different kind of
> error than "missing newline", namely that there is trailing garbage. I
> guess we'd want to report a separate info-level message for this.
>
> Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're
> only checking for trailing garbage after the _last_ newline, not after
> the first one.

None of the above.  It should strbuf_rtrim() and if we removed
anything but just a single terminating LF, we are looking at
something we wouldn't ahve written.  The next check_refname_format()
call would then find "trailing garbage".

 - "refs/heads/master \n " gets rtrimmed to "refs/heads/master",
   which is "valid but curious".

 - "refs/heads/main trash\n " becomes "refs/heads/main trash",
   which is outright bad.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-28 12:50         ` Patrick Steinhardt
@ 2024-08-28 16:32           ` Junio C Hamano
  2024-08-29 10:19             ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 16:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

>> Git 3.0 boundary might be a good time to tighten interoperability
>> rules such that we won't accept anything we wouldn't have written
>> ourselves (not limited to loose ref format, but this applies to
>> anything on-disk or on-wire), but we'd need enough preparation if we
>> want to be able to do so in the future.
>
> I quite like this idea.

I wouldn't say that I wrote it as a devil's advocate comment, but I
was hoping that somebody quote Postel in response, as the above
advocates a directly opposite position, which I wouldn't usually
take.

> I guess another prereq for the change is to integrate `git refs verify`
> with git-fsck(1), because otherwise people likely wouldn't see the new
> messages in the first place.

Absolutely.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-28 14:31         ` shejialuo
@ 2024-08-28 16:45           ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 16:45 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

>> > @@ -170,6 +173,12 @@
>> >  `nullSha1`::
>> >  	(WARN) Tree contains entries pointing to a null sha1.
>> >  
>> > +`refMissingNewline`::
>> > +	(INFO) A valid ref does not end with newline.
>> > +
>> > +`trailingRefContent`::
>> > +	(INFO) A ref has trailing contents.
>> > +
>> >  `treeNotSorted`::
>> >  	(ERROR) A tree is not properly sorted.
>> 
>> There is no mention of "you shouldn't promote these to error" here,
>> which is good.  But wouldn't we want to tell users to report such
>> curiously formatted loose refs, after figuring out who created them,
>> to help us to eventually make the check stricter in the future?
>
> From the review from the Patrick, I will add another patch in the
> "Documentation/BreakingChanges.txt" later.

As that documentation is not end-user facing, it is orthogonal and
unrelated.

What I meant was that we need to tell the user that the refs they
have (and the third-party tools they used to create them) may be
declared invalid in a future version of Git and they would want to
report it, in order to influence our possible future direction.  And
we need to do so in an end-user facing documentation (i.e. the part
of the patch quoted above) and/or in the info messages themselves.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH] SQUASH??? remove unused parameters
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
                       ` (3 preceding siblings ...)
  2024-08-27 16:08     ` [PATCH v2 4/4] ref: add symlink ref " shejialuo
@ 2024-08-28 18:42     ` Junio C Hamano
  2024-08-28 21:28     ` [PATCH v2 0/4] add ref content check for files backend Junio C Hamano
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
  6 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 18:42 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

With -Wunused-parameter, the compiler notices that many parameters
are unused.  They are truly unused, and the signatures for the
functions involved are not constrained externally, so we can simply
drop the parameters from the definition of these functions and their
callers.

Please squash these in when the topic gets rerolled.  Thanks.

 refs/files-backend.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8641e3ba65..69dd283c9d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1966,9 +1966,8 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
 	return ret;
 }
 
-static int create_symref_lock(struct files_ref_store *refs,
-			      struct ref_lock *lock, const char *refname,
-			      const char *target, struct strbuf *err)
+static int create_symref_lock(struct ref_lock *lock, const char *target,
+			      struct strbuf *err)
 {
 	if (!fdopen_lock_file(&lock->lk, "w")) {
 		strbuf_addf(err, "unable to fdopen %s: %s",
@@ -2584,8 +2583,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	}
 
 	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
-		if (create_symref_lock(refs, lock, update->refname,
-				       update->new_target, err)) {
+		if (create_symref_lock(lock, update->new_target, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
 			goto out;
 		}
@@ -3443,7 +3441,6 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    const char *refname,
 				    struct strbuf *pointee_name,
 				    struct strbuf *pointee_path,
 				    unsigned int symbolic_link)
@@ -3565,7 +3562,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		} else {
 			strbuf_addf(&pointee_path, "%s/%s",
 				    ref_store->gitdir, referent.buf);
-			ret = files_fsck_symref_target(o, &report, refname.buf,
+			ret = files_fsck_symref_target(o, &report,
 						       &referent,
 						       &pointee_path,
 						       symbolic_link);
@@ -3589,7 +3586,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	}
 
 	strbuf_addstr(&referent, pointee_name);
-	ret = files_fsck_symref_target(o, &report, refname.buf,
+	ret = files_fsck_symref_target(o, &report,
 				       &referent, &pointee_path,
 				       symbolic_link);
 
-- 
2.46.0-563-gaeb9b172ce


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
                       ` (4 preceding siblings ...)
  2024-08-28 18:42     ` [PATCH] SQUASH??? remove unused parameters Junio C Hamano
@ 2024-08-28 21:28     ` Junio C Hamano
  2024-08-29  4:02       ` Jeff King
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
  6 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-28 21:28 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Jeff King

Here is another one.

By the way, Peff, do we have MAYBE_UNUSED that can be used in a case
like this one?  Platforms without symbolic links supported may well
define NO_SYMLINK_HEAD, which makes the incoming parameters unused.

static int create_ref_symlink(struct ref_lock *lock, const char *target)
{
	int ret = -1;
#ifndef NO_SYMLINK_HEAD
	char *ref_path = get_locked_file_path(&lock->lk);
	unlink(ref_path);
	ret = symlink(target, ref_path);
	free(ref_path);

	if (ret)
		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
#endif
	return ret;
}

We can of course do the attached, which I'll let shejialuo to squash
into an appropriate patch in the series.

Thanks.


 refs/files-backend.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git c/refs/files-backend.c w/refs/files-backend.c
index 69dd283c9d..110af32788 100644
--- c/refs/files-backend.c
+++ w/refs/files-backend.c
@@ -1951,10 +1951,13 @@ static int commit_ref_update(struct files_ref_store *refs,
 	return 0;
 }
 
+#ifdef NO_SYMLINK_HEAD
+#define create_ref_symlink(lock, referent) (-1)
+#else
 static int create_ref_symlink(struct ref_lock *lock, const char *target)
 {
 	int ret = -1;
-#ifndef NO_SYMLINK_HEAD
+
 	char *ref_path = get_locked_file_path(&lock->lk);
 	unlink(ref_path);
 	ret = symlink(target, ref_path);
@@ -1962,9 +1965,9 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
 
 	if (ret)
 		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
-#endif
 	return ret;
 }
+#endif
 
 static int create_symref_lock(struct ref_lock *lock, const char *target,
 			      struct strbuf *err)

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-28 21:28     ` [PATCH v2 0/4] add ref content check for files backend Junio C Hamano
@ 2024-08-29  4:02       ` Jeff King
  2024-08-29  4:59         ` Junio C Hamano
  2024-08-29 15:00         ` [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED Junio C Hamano
  0 siblings, 2 replies; 209+ messages in thread
From: Jeff King @ 2024-08-29  4:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git

On Wed, Aug 28, 2024 at 02:28:47PM -0700, Junio C Hamano wrote:

> By the way, Peff, do we have MAYBE_UNUSED that can be used in a case
> like this one?  Platforms without symbolic links supported may well
> define NO_SYMLINK_HEAD, which makes the incoming parameters unused.

Yes, it would be fine to use MAYBE_UNUSED in a case like this.

The other option, and what I did for a conditional compilation in
imap-send.c, is to just mention the variable like:

  /* mark as used to appease -Wunused-parameter with NO_SYMLINK_HEAD */
  (void)lock;
  (void)target;

In retrospect I think MAYBE_UNUSED is probably a little less magical,
and I perhaps should have used it there.

In this particular case, though, where there's no actual code in one
half of the #ifdef, I think just defining two separate functions is
cleaner. I.e., what you did with a macro below, though I'd probably have
just used a real function with UNUSED markers.

As an aside, I wonder if we should consider deprecating and eventually
dropping support for core.prefersymlinkrefs. I can't think of a reason
anybody would want to use it, and of course it makes no sense as we move
on to alternate backends like reftables. I sent patches ages ago:

  https://lore.kernel.org/git/20151229060055.GA17047@sigill.intra.peff.net/

but I think it may have just gotten lost in the shuffle, and I've
somehow been meaning to re-submit them for 9 years. :-/

-Peff

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29  4:02       ` Jeff King
@ 2024-08-29  4:59         ` Junio C Hamano
  2024-08-29  7:00           ` Patrick Steinhardt
  2024-08-29 15:48           ` shejialuo
  2024-08-29 15:00         ` [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED Junio C Hamano
  1 sibling, 2 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29  4:59 UTC (permalink / raw)
  To: Jeff King; +Cc: shejialuo, git

Jeff King <peff@peff.net> writes:

> As an aside, I wonder if we should consider deprecating and eventually
> dropping support for core.prefersymlinkrefs. I can't think of a reason
> anybody would want to use it, and of course it makes no sense as we move
> on to alternate backends like reftables.

Yup.  Perhaps add an entry or two to BreakingChanges document?

 Documentation/BreakingChanges.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git c/Documentation/BreakingChanges.txt w/Documentation/BreakingChanges.txt
index 0532bfcf7f..2a85740f3c 100644
--- c/Documentation/BreakingChanges.txt
+++ w/Documentation/BreakingChanges.txt
@@ -115,6 +115,12 @@ info/grafts as outdated, 2014-03-05) and will be removed.
 +
 Cf. <20140304174806.GA11561@sigill.intra.peff.net>.
 
+* Support for core.prefersymlinkrefs will be dropped.  Support for
+  existing repositories that use symbolic links to represent a
+  symbolic ref may or may not be dropped.
++
+Cf. <20240829040215.GA4054823@coredump.intra.peff.net>
+
 == Superseded features that will not be deprecated
 
 Some features have gained newer replacements that aim to improve the design in

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29  4:59         ` Junio C Hamano
@ 2024-08-29  7:00           ` Patrick Steinhardt
  2024-08-29 15:07             ` Junio C Hamano
  2024-08-29 19:48             ` Jeff King
  2024-08-29 15:48           ` shejialuo
  1 sibling, 2 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-29  7:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, shejialuo, git

On Wed, Aug 28, 2024 at 09:59:58PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > As an aside, I wonder if we should consider deprecating and eventually
> > dropping support for core.prefersymlinkrefs. I can't think of a reason
> > anybody would want to use it, and of course it makes no sense as we move
> > on to alternate backends like reftables.
> 
> Yup.  Perhaps add an entry or two to BreakingChanges document?
> 
>  Documentation/BreakingChanges.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git c/Documentation/BreakingChanges.txt w/Documentation/BreakingChanges.txt
> index 0532bfcf7f..2a85740f3c 100644
> --- c/Documentation/BreakingChanges.txt
> +++ w/Documentation/BreakingChanges.txt
> @@ -115,6 +115,12 @@ info/grafts as outdated, 2014-03-05) and will be removed.
>  +
>  Cf. <20140304174806.GA11561@sigill.intra.peff.net>.
>  
> +* Support for core.prefersymlinkrefs will be dropped.  Support for
> +  existing repositories that use symbolic links to represent a
> +  symbolic ref may or may not be dropped.
> ++
> +Cf. <20240829040215.GA4054823@coredump.intra.peff.net>
> +
>  == Superseded features that will not be deprecated

Yes, I'm very much in favor of that. As Peff said, I don't see a single
reason why it would make sense to use symlinks nowadays. We have also
supported the "new" syntax for ages now, and I'd be surprised if there
were repos out there using it on purpose.

We should probably do the above together with a new check that starts to
warn about symbolic links in "refs/" such that users become aware of
this deprecation. We'd have to grow the infrastructure to also scan root
refs though, which to the best of my knowledge we don't currently scan.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend
  2024-08-28 15:41         ` Junio C Hamano
@ 2024-08-29 10:11           ` Patrick Steinhardt
  0 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-29 10:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git, Karthik Nayak

On Wed, Aug 28, 2024 at 08:41:08AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Also, I think we don't typically call the value of a symbolic ref
> > "pointee", but "target". Searching for "pointee" in our codebase only
> > gives a single hit, and that one is not related to symbolic refs.
> 
> Yesterday while I was studying for reviewing this series, I saw some
> existing code that call them "referent".  There may also be "target".

Ah, true, I totally forgot about "referent". I guess we use both, but it
would of course be great if we only had a single term to refer them.
Referent seems to be used more widely, at least in the refs subsystem.

> >> +	if (!newline_pos || *(newline_pos + 1)) {
> >> +		ret = fsck_report_ref(o, report,
> >> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> >> +				      "missing newline");
> >> +	}
> >
> > The second condition `*(newline_pos + 1)` checks whether there is any
> > data after the newline, doesn't it? That indicates a different kind of
> > error than "missing newline", namely that there is trailing garbage. I
> > guess we'd want to report a separate info-level message for this.
> >
> > Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're
> > only checking for trailing garbage after the _last_ newline, not after
> > the first one.
> 
> None of the above.  It should strbuf_rtrim() and if we removed
> anything but just a single terminating LF, we are looking at
> something we wouldn't ahve written.  The next check_refname_format()
> call would then find "trailing garbage".

Fair.

>  - "refs/heads/master \n " gets rtrimmed to "refs/heads/master",
>    which is "valid but curious".

Okay. This _may_ be something to generate an info message for, mostly in
the same spirit as we want to do it for direct refs.

>  - "refs/heads/main trash\n " becomes "refs/heads/main trash",
>    which is outright bad.

Yeah, this one should be an error indeed.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 2/4] ref: add regular ref content check for files backend
  2024-08-28 16:32           ` Junio C Hamano
@ 2024-08-29 10:19             ` Patrick Steinhardt
  0 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-08-29 10:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, git, Karthik Nayak

On Wed, Aug 28, 2024 at 09:32:16AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> Git 3.0 boundary might be a good time to tighten interoperability
> >> rules such that we won't accept anything we wouldn't have written
> >> ourselves (not limited to loose ref format, but this applies to
> >> anything on-disk or on-wire), but we'd need enough preparation if we
> >> want to be able to do so in the future.
> >
> > I quite like this idea.
> 
> I wouldn't say that I wrote it as a devil's advocate comment, but I
> was hoping that somebody quote Postel in response, as the above
> advocates a directly opposite position, which I wouldn't usually
> take.

For context, this is the quote you probably refer to: "be conservative
in what you do, be liberal in what you accept from others".

In any case, I still think it is sensible to at least warn about refs
like this. It is unexpected to me and may indicate real issues in the
understanding of others that end up writing to the refdb. If there are
implementations of Git out there that intentionally use our lax parsing
to e.g. stuff additional metadata into refs, then we need to tell them
that this is not okay.

This may have been fine in the past where there was only a single ref
backend, but now with multiple ref backends the picture has changed in
my opinion.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED
  2024-08-29  4:02       ` Jeff King
  2024-08-29  4:59         ` Junio C Hamano
@ 2024-08-29 15:00         ` Junio C Hamano
  2024-08-29 17:52           ` Jeff King
  1 sibling, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 15:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Wed, Aug 28, 2024 at 02:28:47PM -0700, Junio C Hamano wrote:
>
>> By the way, Peff, do we have MAYBE_UNUSED that can be used in a case
>> like this one?  Platforms without symbolic links supported may well
>> define NO_SYMLINK_HEAD, which makes the incoming parameters unused.
>
> Yes, it would be fine to use MAYBE_UNUSED in a case like this.

It turns out that I was, without realizing it myself, making an
oblique reference to your patch 7/6 ;-)

Perhaps something along this line?

---- >8 ----
Subject: CodingGuidelines: also mention MAYBE_UNUSED

A function that uses a parameter in one build may lose all uses of
the parameter in another build, depending on the configuration.  A
workaround for such a case, MAYBE_UNUSED, should also be mentioned
when we recommend the use of UNUSED to our developers.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/CodingGuidelines |  5 +++--
 git-compat-util.h              | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index d0fc7cfe60..3263245b03 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -262,8 +262,9 @@ For C programs:
    like "error: unused parameter 'foo' [-Werror=unused-parameter]",
    which indicates that a function ignores its argument. If the unused
    parameter can't be removed (e.g., because the function is used as a
-   callback and has to match a certain interface), you can annotate the
-   individual parameters with the UNUSED keyword, like "int foo UNUSED".
+   callback and has to match a certain interface), you can annotate
+   the individual parameters with the UNUSED (or MAYBE_UNUSED)
+   keyword, like "int foo UNUSED".
 
  - We try to support a wide range of C compilers to compile Git with,
    including old ones.  As of Git v2.35.0 Git requires C99 (we check
diff --git c/git-compat-util.h w/git-compat-util.h
index 71b4d23f03..23307ce780 100644
--- c/git-compat-util.h
+++ w/git-compat-util.h
@@ -195,6 +195,17 @@ struct strbuf;
 #define _NETBSD_SOURCE 1
 #define _SGI_SOURCE 1
 
+/*
+ * UNUSED marks a function parameter that is always unused.
+ *
+ * A callback interface may dictate that a function accepts a
+ * parameter at that position, but the implementation of the function
+ * may not need to use the parameter.  In such a case, mark the parameter
+ * with UNUSED.
+ *
+ * When a parameter may be used or unused, depending on conditional
+ * compilation, consider using MAYBE_UNUSED instead.
+ */
 #if GIT_GNUC_PREREQ(4, 5)
 #define UNUSED __attribute__((unused)) \
 	__attribute__((deprecated ("parameter declared as UNUSED")))
@@ -649,6 +660,16 @@ static inline int git_has_dir_sep(const char *path)
 #define RESULT_MUST_BE_USED
 #endif
 
+/*
+ * MAYBE_UNUSED marks a function parameter that may be unused, but
+ * whose use is not an error.
+ *
+ * Depending on a configuration, all uses of a function parameter may
+ * become #ifdef'ed away.  Marking such a parameter with UNUSED would
+ * give a warning in a compilation where the parameter is indeed used,
+ * and not marking such a parameter would give a warning in a
+ * compilation where the parameter is unused.
+ */
 #define MAYBE_UNUSED __attribute__((__unused__))
 
 #include "compat/bswap.h"

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29  7:00           ` Patrick Steinhardt
@ 2024-08-29 15:07             ` Junio C Hamano
  2024-08-29 19:48             ` Jeff King
  1 sibling, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 15:07 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, shejialuo, git

Patrick Steinhardt <ps@pks.im> writes:

>> +* Support for core.prefersymlinkrefs will be dropped.  Support for
>> +  existing repositories that use symbolic links to represent a
>> +  symbolic ref may or may not be dropped.
>> ++
>> +Cf. <20240829040215.GA4054823@coredump.intra.peff.net>
>> +
>>  == Superseded features that will not be deprecated
> ...
> We should probably do the above together with a new check that starts to
> warn about symbolic links in "refs/" such that users become aware of
> this deprecation. We'd have to grow the infrastructure to also scan root
> refs though, which to the best of my knowledge we don't currently scan.

Yup, that is why the above suggestion is on _this_ thread that is
about the "check for curiously formatted symrefs, in the hope that
we can retroactively tighten our checks later" topic.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29  4:59         ` Junio C Hamano
  2024-08-29  7:00           ` Patrick Steinhardt
@ 2024-08-29 15:48           ` shejialuo
  2024-08-29 16:12             ` Junio C Hamano
  1 sibling, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-08-29 15:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Wed, Aug 28, 2024 at 09:59:58PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > As an aside, I wonder if we should consider deprecating and eventually
> > dropping support for core.prefersymlinkrefs. I can't think of a reason
> > anybody would want to use it, and of course it makes no sense as we move
> > on to alternate backends like reftables.
> 
> Yup.  Perhaps add an entry or two to BreakingChanges document?
> 
>  Documentation/BreakingChanges.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git c/Documentation/BreakingChanges.txt w/Documentation/BreakingChanges.txt
> index 0532bfcf7f..2a85740f3c 100644
> --- c/Documentation/BreakingChanges.txt
> +++ w/Documentation/BreakingChanges.txt
> @@ -115,6 +115,12 @@ info/grafts as outdated, 2014-03-05) and will be removed.
>  +
>  Cf. <20140304174806.GA11561@sigill.intra.peff.net>.
>  
> +* Support for core.prefersymlinkrefs will be dropped.  Support for
> +  existing repositories that use symbolic links to represent a
> +  symbolic ref may or may not be dropped.
> ++
> +Cf. <20240829040215.GA4054823@coredump.intra.peff.net>
> +
>  == Superseded features that will not be deprecated
>  
>  Some features have gained newer replacements that aim to improve the design in

From my current understanding, I think I need to rebase two patches
provided by your here:

  https://lore.kernel.org/git/xmqqle0gzdyh.fsf_-_@gitster.g/
  https://lore.kernel.org/git/xmqqbk1cz69c.fsf@gitster.g/

I think in this patch, we just info the user that we will drop
"core.prefersymlinkrefs" later, so I should not concern about this
patch and also the [PATCH 8/6].

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29 15:48           ` shejialuo
@ 2024-08-29 16:12             ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 16:12 UTC (permalink / raw)
  To: shejialuo; +Cc: Jeff King, git

shejialuo <shejialuo@gmail.com> writes:

> From my current understanding, I think I need to rebase two patches
> provided by your here:
>
>   https://lore.kernel.org/git/xmqqle0gzdyh.fsf_-_@gitster.g/
>   https://lore.kernel.org/git/xmqqbk1cz69c.fsf@gitster.g/

They are to be squashed into your patch, "suggested edit" for your
changes, not "to be rebased".  In other words, we do not want to see
a patch (from your v2 as-is) to create problems and then another
patch (taken from one of these links) applied on top to remedy them.
We instead want to see a patch (start from your v2 but with the
changes from these links) that does not introduce problems in the
first place.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED
  2024-08-29 15:00         ` [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED Junio C Hamano
@ 2024-08-29 17:52           ` Jeff King
  2024-08-29 18:06             ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Jeff King @ 2024-08-29 17:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 29, 2024 at 08:00:19AM -0700, Junio C Hamano wrote:

> > Yes, it would be fine to use MAYBE_UNUSED in a case like this.
> 
> It turns out that I was, without realizing it myself, making an
> oblique reference to your patch 7/6 ;-)
> 
> Perhaps something along this line?

Yeah, this looks good. A few small comments below (but I'm not sure
anything needs to be changed).

> diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
> index d0fc7cfe60..3263245b03 100644
> --- c/Documentation/CodingGuidelines
> +++ w/Documentation/CodingGuidelines
> @@ -262,8 +262,9 @@ For C programs:
>     like "error: unused parameter 'foo' [-Werror=unused-parameter]",
>     which indicates that a function ignores its argument. If the unused
>     parameter can't be removed (e.g., because the function is used as a
> -   callback and has to match a certain interface), you can annotate the
> -   individual parameters with the UNUSED keyword, like "int foo UNUSED".
> +   callback and has to match a certain interface), you can annotate
> +   the individual parameters with the UNUSED (or MAYBE_UNUSED)
> +   keyword, like "int foo UNUSED".

Here I was going to suggest explaining why you'd use one or the other
(because I'm afraid of people using MAYBE_UNUSED when UNUSED would be
more appropriate). But I think the extra comments you added later are
even better, as it lets us explain without cluttering up the
CodingGuidelines document.

> +/*
> + * UNUSED marks a function parameter that is always unused.
> + *
> + * A callback interface may dictate that a function accepts a
> + * parameter at that position, but the implementation of the function
> + * may not need to use the parameter.  In such a case, mark the parameter
> + * with UNUSED.
> + *
> + * When a parameter may be used or unused, depending on conditional
> + * compilation, consider using MAYBE_UNUSED instead.
> + */

Looks good.

> +/*
> + * MAYBE_UNUSED marks a function parameter that may be unused, but
> + * whose use is not an error.
> + *
> + * Depending on a configuration, all uses of a function parameter may
> + * become #ifdef'ed away.  Marking such a parameter with UNUSED would
> + * give a warning in a compilation where the parameter is indeed used,
> + * and not marking such a parameter would give a warning in a
> + * compilation where the parameter is unused.
> + */
>  #define MAYBE_UNUSED __attribute__((__unused__))

This is all good as pertains to function parameters. But the original
reason we added MAYBE_UNUSED was actually for static functions that were
auto-generated by the commit-slab macros. Saying "...marks a function
parameter" implies to me that it's the only use. I don't know if we want
to be more expansive here or not. Adding auto-generated macro functions
should be quite a rarity, I'd think.

-Peff

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED
  2024-08-29 17:52           ` Jeff King
@ 2024-08-29 18:06             ` Junio C Hamano
  2024-08-29 18:18               ` [PATCH v2] " Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 18:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> +/*
>> + * MAYBE_UNUSED marks a function parameter that may be unused, but
>> + * whose use is not an error.
>> + *
>> + * Depending on a configuration, all uses of a function parameter may
>> + * become #ifdef'ed away.  Marking such a parameter with UNUSED would
>> + * give a warning in a compilation where the parameter is indeed used,
>> + * and not marking such a parameter would give a warning in a
>> + * compilation where the parameter is unused.
>> + */
>>  #define MAYBE_UNUSED __attribute__((__unused__))
>
> This is all good as pertains to function parameters. But the original
> reason we added MAYBE_UNUSED was actually for static functions that were
> auto-generated by the commit-slab macros. Saying "...marks a function
> parameter" implies to me that it's the only use. I don't know if we want
> to be more expansive here or not. Adding auto-generated macro functions
> should be quite a rarity, I'd think.

True.  You can annotate types, variables, and functions with the
attributes as well.  How about saying something like this

    MAYBE_UNUSED marks a function parameter that may be unused but
    whose use is not an error.  It also can be applied to functions,
    types and variables.

and then keep the explanation of why you may want to use the maybe-
variant as-is, using a function parameter as an example?  Or I could
rewrite "parameter" and "function parameter" in it with "thing"
(with double quotes around), like:

    Depending on a configuration, all uses of a "thing" may become
    #ifdef'ed away....

Unlike the use of deprecated attribute, our definition of
MAYBE_UNUSED is not guarded with anything.  Shouldn't we at least do

    #if defined(__GNUC__)
    #define MAYBE_UNUSED __attribute__((__unused__))
    #else
    #define MAYBE_UNUSED /* noop */
    #endif

or something, by the way?

Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED
  2024-08-29 18:06             ` Junio C Hamano
@ 2024-08-29 18:18               ` Junio C Hamano
  2024-08-29 18:27                 ` [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__ Junio C Hamano
  2024-08-29 19:40                 ` [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED Jeff King
  0 siblings, 2 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 18:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

A function that uses a parameter in one build may lose all uses of
the parameter in another build, depending on the configuration.  A
workaround for such a case, MAYBE_UNUSED, should also be mentioned
when we recommend the use of UNUSED to our developers.

Keep the addition to the guideline short and document the criteria
to choose between UNUSED and MAYBE_UNUSED near their definition.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/CodingGuidelines |  5 +++--
 git-compat-util.h              | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index d0fc7cfe60..3263245b03 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -262,8 +262,9 @@ For C programs:
    like "error: unused parameter 'foo' [-Werror=unused-parameter]",
    which indicates that a function ignores its argument. If the unused
    parameter can't be removed (e.g., because the function is used as a
-   callback and has to match a certain interface), you can annotate the
-   individual parameters with the UNUSED keyword, like "int foo UNUSED".
+   callback and has to match a certain interface), you can annotate
+   the individual parameters with the UNUSED (or MAYBE_UNUSED)
+   keyword, like "int foo UNUSED".
 
  - We try to support a wide range of C compilers to compile Git with,
    including old ones.  As of Git v2.35.0 Git requires C99 (we check
diff --git a/git-compat-util.h b/git-compat-util.h
index 71b4d23f03..e4a306dd56 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -195,6 +195,19 @@ struct strbuf;
 #define _NETBSD_SOURCE 1
 #define _SGI_SOURCE 1
 
+/*
+ * UNUSED marks a function parameter that is always unused.  It also
+ * can be used to annotate a function, a variable, or a type that is
+ * always unused.
+ *
+ * A callback interface may dictate that a function accepts a
+ * parameter at that position, but the implementation of the function
+ * may not need to use the parameter.  In such a case, mark the parameter
+ * with UNUSED.
+ *
+ * When a parameter may be used or unused, depending on conditional
+ * compilation, consider using MAYBE_UNUSED instead.
+ */
 #if GIT_GNUC_PREREQ(4, 5)
 #define UNUSED __attribute__((unused)) \
 	__attribute__((deprecated ("parameter declared as UNUSED")))
@@ -649,6 +662,17 @@ static inline int git_has_dir_sep(const char *path)
 #define RESULT_MUST_BE_USED
 #endif
 
+/*
+ * MAYBE_UNUSED marks a function parameter that may be unused, but
+ * whose use is not an error.  It also can be used to annotate a
+ * function, a variable, or a type that may be unused.
+ *
+ * Depending on a configuration, all uses of such a thing may become
+ * #ifdef'ed away.  Marking it with UNUSED would give a warning in a
+ * compilation where it is indeed used, and not marking it at all
+ * would give a warning in a compilation where it is unused.  In such
+ * a case, MAYBE_UNUSED is the appropriate annotation to use.
+ */
 #define MAYBE_UNUSED __attribute__((__unused__))
 
 #include "compat/bswap.h"

Interdiff against v1:
  diff --git a/git-compat-util.h b/git-compat-util.h
  index 23307ce780..e4a306dd56 100644
  --- a/git-compat-util.h
  +++ b/git-compat-util.h
  @@ -196,7 +196,9 @@ struct strbuf;
   #define _SGI_SOURCE 1
   
   /*
  - * UNUSED marks a function parameter that is always unused.
  + * UNUSED marks a function parameter that is always unused.  It also
  + * can be used to annotate a function, a variable, or a type that is
  + * always unused.
    *
    * A callback interface may dictate that a function accepts a
    * parameter at that position, but the implementation of the function
  @@ -662,13 +664,14 @@ static inline int git_has_dir_sep(const char *path)
   
   /*
    * MAYBE_UNUSED marks a function parameter that may be unused, but
  - * whose use is not an error.
  + * whose use is not an error.  It also can be used to annotate a
  + * function, a variable, or a type that may be unused.
    *
  - * Depending on a configuration, all uses of a function parameter may
  - * become #ifdef'ed away.  Marking such a parameter with UNUSED would
  - * give a warning in a compilation where the parameter is indeed used,
  - * and not marking such a parameter would give a warning in a
  - * compilation where the parameter is unused.
  + * Depending on a configuration, all uses of such a thing may become
  + * #ifdef'ed away.  Marking it with UNUSED would give a warning in a
  + * compilation where it is indeed used, and not marking it at all
  + * would give a warning in a compilation where it is unused.  In such
  + * a case, MAYBE_UNUSED is the appropriate annotation to use.
    */
   #define MAYBE_UNUSED __attribute__((__unused__))
   
-- 
2.46.0-563-gaeb9b172ce


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__
  2024-08-29 18:18               ` [PATCH v2] " Junio C Hamano
@ 2024-08-29 18:27                 ` Junio C Hamano
  2024-08-29 19:45                   ` Jeff King
  2024-08-29 19:40                 ` [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED Jeff King
  1 sibling, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 18:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Just like we only define UNUSED macro when __GNUC__ is defined,
and fall back to an empty definition otherwise, we should do the
same for MAYBE_UNUSED.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 * Before I forget that we have discussed this, just as a
   documentation (read: this is not a patch to be applied).

   I think this only matters when a compiler satisfies all three
   traits:

   - does not define __GNUC__
   - does have its own __attribute__() macro
   - barfs on __attribute__((__unused__))

   Otherwise we will define __attribute__(x) away to empty to cause
   no harm.

   Since we have survived without complaints without such a guard
   for quite some time, it may be a sign that no compiler that knows
   __attribute__() that people ever tried to compile Git with barfs
   with __attribute__((__unused__)).  I dunno.

 git-compat-util.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index e4a306dd56..74ed581b5d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -673,7 +673,11 @@ static inline int git_has_dir_sep(const char *path)
  * would give a warning in a compilation where it is unused.  In such
  * a case, MAYBE_UNUSED is the appropriate annotation to use.
  */
+#ifdef __GNUC__
 #define MAYBE_UNUSED __attribute__((__unused__))
+#else
+#define MAYBE_UNUSED
+#endif
 
 #include "compat/bswap.h"
 
-- 
2.46.0-563-gaeb9b172ce


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED
  2024-08-29 18:18               ` [PATCH v2] " Junio C Hamano
  2024-08-29 18:27                 ` [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__ Junio C Hamano
@ 2024-08-29 19:40                 ` Jeff King
  1 sibling, 0 replies; 209+ messages in thread
From: Jeff King @ 2024-08-29 19:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 29, 2024 at 11:18:06AM -0700, Junio C Hamano wrote:

> +/*
> + * MAYBE_UNUSED marks a function parameter that may be unused, but
> + * whose use is not an error.  It also can be used to annotate a
> + * function, a variable, or a type that may be unused.
> + *
> + * Depending on a configuration, all uses of such a thing may become
> + * #ifdef'ed away.  Marking it with UNUSED would give a warning in a
> + * compilation where it is indeed used, and not marking it at all
> + * would give a warning in a compilation where it is unused.  In such
> + * a case, MAYBE_UNUSED is the appropriate annotation to use.
> + */
>  #define MAYBE_UNUSED __attribute__((__unused__))

Thanks, I think this is good. There's more nuanced discussion about when
the "MAYBE" variant could be used for non-parameters, but I don't know
that it's worth trying to enumerate every place we've found it useful.

-Peff

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__
  2024-08-29 18:27                 ` [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__ Junio C Hamano
@ 2024-08-29 19:45                   ` Jeff King
  2024-08-29 20:19                     ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Jeff King @ 2024-08-29 19:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 29, 2024 at 11:27:39AM -0700, Junio C Hamano wrote:

> Just like we only define UNUSED macro when __GNUC__ is defined,
> and fall back to an empty definition otherwise, we should do the
> same for MAYBE_UNUSED.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  * Before I forget that we have discussed this, just as a
>    documentation (read: this is not a patch to be applied).
> 
>    I think this only matters when a compiler satisfies all three
>    traits:
> 
>    - does not define __GNUC__
>    - does have its own __attribute__() macro
>    - barfs on __attribute__((__unused__))
> 
>    Otherwise we will define __attribute__(x) away to empty to cause
>    no harm.
> 
>    Since we have survived without complaints without such a guard
>    for quite some time, it may be a sign that no compiler that knows
>    __attribute__() that people ever tried to compile Git with barfs
>    with __attribute__((__unused__)).  I dunno.

Yeah, I was surprised that this didn't have a guard and was not
currently barfing on other compilers. And the answer is that we already
turn __attribute__ into a noop on non-GNUC platforms.

Which made me wonder if UNUSED really needs its guards. It does, because
it is defined early in the file, before the __attribute__ handling. I
don't think we want to move it down, since it needs to be available for
use by inline'd compat wrappers. But arguably we should move the
attribute macro earlier in the file?

I don't know that it is really worth spending too much time futzing
with, though.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index e4a306dd56..74ed581b5d 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -673,7 +673,11 @@ static inline int git_has_dir_sep(const char *path)
>   * would give a warning in a compilation where it is unused.  In such
>   * a case, MAYBE_UNUSED is the appropriate annotation to use.
>   */
> +#ifdef __GNUC__
>  #define MAYBE_UNUSED __attribute__((__unused__))
> +#else
> +#define MAYBE_UNUSED
> +#endif

So yeah, I'm not necessarily opposed to this, but I don't think it's
really doing anything in practice.

-Peff

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v2 0/4] add ref content check for files backend
  2024-08-29  7:00           ` Patrick Steinhardt
  2024-08-29 15:07             ` Junio C Hamano
@ 2024-08-29 19:48             ` Jeff King
  1 sibling, 0 replies; 209+ messages in thread
From: Jeff King @ 2024-08-29 19:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, shejialuo, git

On Thu, Aug 29, 2024 at 09:00:58AM +0200, Patrick Steinhardt wrote:

> > diff --git c/Documentation/BreakingChanges.txt w/Documentation/BreakingChanges.txt
> > index 0532bfcf7f..2a85740f3c 100644
> > --- c/Documentation/BreakingChanges.txt
> > +++ w/Documentation/BreakingChanges.txt
> > @@ -115,6 +115,12 @@ info/grafts as outdated, 2014-03-05) and will be removed.
> >  +
> >  Cf. <20140304174806.GA11561@sigill.intra.peff.net>.
> >  
> > +* Support for core.prefersymlinkrefs will be dropped.  Support for
> > +  existing repositories that use symbolic links to represent a
> > +  symbolic ref may or may not be dropped.
> > ++
> > +Cf. <20240829040215.GA4054823@coredump.intra.peff.net>
> > +
> >  == Superseded features that will not be deprecated
> 
> Yes, I'm very much in favor of that. As Peff said, I don't see a single
> reason why it would make sense to use symlinks nowadays. We have also
> supported the "new" syntax for ages now, and I'd be surprised if there
> were repos out there using it on purpose.
> 
> We should probably do the above together with a new check that starts to
> warn about symbolic links in "refs/" such that users become aware of
> this deprecation. We'd have to grow the infrastructure to also scan root
> refs though, which to the best of my knowledge we don't currently scan.

I think the first step of the proposal (and what I had written in the
patches that I linked) was just that we would stop _writing_ symlinks.
And there we'd only need to warn people who have that config option set.

Whether to drop the reading side is less clear to me. I think in the
long run it is good as a cleanup (and one less source of weird behavior
that malicious local repos can trigger). But that decision can be made
separately. I think it would be OK to just issue a deprecation warning
whenever we actually follow a symlink (because I think we do so
manually, since we need to know the target name as part of the
resolution process).

-Peff

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__
  2024-08-29 19:45                   ` Jeff King
@ 2024-08-29 20:19                     ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-08-29 20:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Thu, Aug 29, 2024 at 11:27:39AM -0700, Junio C Hamano wrote:
>
>> Just like we only define UNUSED macro when __GNUC__ is defined,
>> and fall back to an empty definition otherwise, we should do the
>> same for MAYBE_UNUSED.
>> 
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>  * Before I forget that we have discussed this, just as a
>>    documentation (read: this is not a patch to be applied).
>> 
>>    I think this only matters when a compiler satisfies all three
>>    traits:
>> 
>>    - does not define __GNUC__
>>    - does have its own __attribute__() macro
>>    - barfs on __attribute__((__unused__))
>> 
>>    Otherwise we will define __attribute__(x) away to empty to cause
>>    no harm.
>> 
>>    Since we have survived without complaints without such a guard
>>    for quite some time, it may be a sign that no compiler that knows
>>    __attribute__() that people ever tried to compile Git with barfs
>>    with __attribute__((__unused__)).  I dunno.
>
> Yeah, I was surprised that this didn't have a guard and was not
> currently barfing on other compilers. And the answer is that we already
> turn __attribute__ into a noop on non-GNUC platforms.

Plus these non-GNUC platforms either

 (1) do not have their own __attribute__, which lets us turn
     __attribute__() into noop, or

 (2) have their own __attribute__, but they happen to support
     __attribute__((__unused__)).

If somebody has __attribute__() and does not support (__unused__) in
it, use of MAYBE_UNUSED would be broken (maybe their __attribute__()
supports other things but not unused).

> Which made me wonder if UNUSED really needs its guards. It does, because
> it is defined early in the file, before the __attribute__ handling. I
> don't think we want to move it down, since it needs to be available for
> use by inline'd compat wrappers. But arguably we should move the
> attribute macro earlier in the file?

And moving __attribute__ definition earlier in the file would not
help such a platform with broken __attribute__((__unused__))

> I don't know that it is really worth spending too much time futzing
> with, though.

I am inclined to think it is not.  So let's scrap the patch.  The
list archive will hopefully remember when it becomes necessary ;-)

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v3 0/4] add ref content check for files backend
  2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
                       ` (5 preceding siblings ...)
  2024-08-28 21:28     ` [PATCH v2 0/4] add ref content check for files backend Junio C Hamano
@ 2024-09-03 12:18     ` shejialuo
  2024-09-03 12:20       ` [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
                         ` (4 more replies)
  6 siblings, 5 replies; 209+ messages in thread
From: shejialuo @ 2024-09-03 12:18 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This new version does the following things:

1. [PATCH v3 1/4]

    + the motivation of the previous commit message is too strong, this
    version improves this.

2. [PATCH v3 2/4]

    + Enhance the commit message to make things clearer.
    + Enhance the descriptions of the fsck message ids. Tell the user we
    may consider converting info to error later to let the user know
    this and report some feedback to us.
    + Use "goto" to avoid unnecessary indentation.
    + Use "test_commit" to create a single commit in the test file to
    avoid unnecessary setups.
    + Enhance the test cases by adding normal situation case test and
    add a new aggregation test to double verify the functionality.
    + Clean the "> $file" to ">$file" to make the code style correct.

3. [PATCH v3 3/4]

    + Enhance the commit message by better describing the motivation.
    + Change the fsck message name "badSymrefPointee" to
    "badSymrefTarget" to be align with the codebase. And talking more
    about what is the "bad" in the documentation.
    + Use idea from Junio to check the textual symref content.
    + Still keep the following code:

        if (lstat(referent->buf, &st))
            goto out;

        if (S_ISDIR(st.st_mode)) {
            ret = report(...);
            goto out;
        }

      This is because that we cannot know whether "refs/heads/a" is a
      regular ref or a directory by using "check_refname_format". So we
      have to add this check. It may seem we have done this when
      iterating the "refs" directory. However, we do report error for
      other NON-symlink and NON-regular file type but for directory, we
      omit. We cannot say oh this is not right. So we need to explicitly
      check here.

    + Like [PATCH v3 2/4], enhance the test code.

4. [PATCH v3 4/4]

    + Enhance the commit message
    + Introduce a new fsck info "symlinkRef" to warn the user that we
    will see this warning as an error when we drop the symlink ref
    support.
    + Squash the following two patches into this patch:
      https://lore.kernel.org/git/xmqqle0gzdyh.fsf_-_@gitster.g/
      https://lore.kernel.org/git/xmqqbk1cz69c.fsf@gitster.g/

Thanks,
Jialuo


shejialuo (4):
  ref: initialize "fsck_ref_report" with zero
  ref: add regular ref content check for files backend
  ref: add symref content check for files backend
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  20 ++
 fsck.h                        |   5 +
 refs.c                        |   2 +-
 refs/files-backend.c          | 205 ++++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 334 ++++++++++++++++++++++++++++++++++
 6 files changed, 556 insertions(+), 12 deletions(-)

Range-diff against v2:
1:  c49a216b70 ! 1:  9fdab751c1 ref: initialize "fsck_ref_report" with zero
    @@ Commit message
         NULL instead of letting them point to anywhere when creating a new
         "fsck_ref_report" structure.
     
    -    The original code explicitly specifies the ".path" field to initialize
    -    the "fsck_ref_report" structure. However, it introduces confusion how we
    -    initialize the other fields. In order to avoid this, initialize the
    -    "fsck_ref_report" with zero to make clear that everything in
    -    "fsck_ref_report" is zero initialized.
    +    The original code explicitly initializes the "path" member in the
    +    "struct fsck_ref_report" to NULL (which implicitly 0-initializes other
    +    members in the struct). It is more customary to use " {0} " to express
    +    that we are 0-initializing everything. In order to be align with the the
    +    codebase, initialize "fsck_ref_report" with zero.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_stor
      
      	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
     -		struct fsck_ref_report report = { .path = NULL };
    -+		struct fsck_ref_report report = {0};
    ++		struct fsck_ref_report report = { 0 };
      
      		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
      		report.path = sb.buf;
2:  99e37b0304 ! 2:  4640b6e345 ref: add regular ref content check for files backend
    @@ Commit message
         We implicitly rely on "git-fsck(1)" to check the consistency of regular
         refs. However, when parsing the regular refs for files backend by using
         "files-backend.c::parse_loose_ref_contents", we allow the ref content to
    -    be end with no newline or contain some garbages.
    +    end with no newline or to contain some garbages.
     
    -    It may seem that we should report an error or warn fsck message to the
    -    user about above situations. However, there may be some third-party
    -    tools customizing the content of refs. We should not report an error
    -    fsck message.
    +    Even though we never create such loose refs ourselves, we have accepted
    +    such loose refs. So, it is entirely possible that some third-party tools
    +    may rely on such loose refs being valid. We should not report an error
    +    fsck message at current. But let's notice such a "curiously formatted"
    +    loose refs being valid and tell the user our findings, so we can access
    +    the possible extent of damage when we tighten the parsing rules in the
    +    future.
     
    -    And we cannot either report a warn fsck message to the user. This is
    -    because if the caller set the "strict" field in "fsck_options" to
    -    to upgrade the fsck warnings to errors.
    +    And it's not suitable to either report a warn fsck message to the user.
    +    This is because if the caller set the "strict" field in "fsck_options",
    +    fsck warns will be automatically upgraded to errors. We should not allow
    +    user to specify the "--strict" flag to upgrade the fsck warnings to
    +    errors at current. It might cause compatibility issue which may break
    +    the legacy repository. So we add the following two fsck infos to
    +    represent the situation where the ref content ends without newline or has
    +    garbages:
     
    -    We should not allow the user to upgrade the fsck warnings to errors. It
    -    might cause compatibility issue which will break the legacy repository.
    -    So we add the following two fsck infos to represent the situation where
    -    the ref content ends without newline or has garbages:
    +    1. "refMissingNewline(INFO)": A ref does not end with newline. This kind
    +       of ref may be considered ERROR in the future.
    +    2. "trailingRefContent(INFO)": A ref has trailing contents. This kind of
    +       ref may be considered ERROR in the future.
     
    -    1. "refMissingNewline(INFO)": A valid ref does not end with newline.
    -    2. "trailingRefContent(INFO)": A ref has trailing contents.
    -
    -    In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
    -    and we can still warn the user about these situations when using
    -    "git-refs verify" without introducing compatibility issue.
    +    It may seem that we could not give the user any warnings by creating
    +    fsck infos. However, in "fsck.c::fsck_vreport", we will convert
    +    "FSCK_INFO" to "FSCK_WARN" and we can still warn the user about these
    +    situations when using "git-refs verify" without introducing
    +    compatibility issue.
     
         In current "git-fsck(1)", it will report an error when the ref content
         is bad, so we should following this to report an error to the user when
    @@ Documentation/fsck-msgids.txt
      	(WARN) Tree contains entries pointing to a null sha1.
      
     +`refMissingNewline`::
    -+	(INFO) A valid ref does not end with newline.
    ++	(INFO) A ref does not end with newline. This kind of ref may
    ++	be considered ERROR in the future.
     +
     +`trailingRefContent`::
    -+	(INFO) A ref has trailing contents.
    ++	(INFO) A ref has trailing contents. This kind of ref may be
    ++	considered ERROR in the future.
     +
      `treeNotSorted`::
      	(ERROR) A tree is not properly sorted.
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
     +	report.path = refname.buf;
     +
    -+	if (S_ISREG(iter->st.st_mode)) {
    -+		if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
    -+			ret = error_errno(_("%s/%s: unable to read the ref"),
    -+					  refs_check_dir, iter->relative_path);
    -+			goto cleanup;
    -+		}
    ++	if (S_ISLNK(iter->st.st_mode))
    ++		goto cleanup;
    ++
    ++	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
    ++		ret = error_errno(_("%s/%s: unable to read the ref"),
    ++				  refs_check_dir, iter->relative_path);
    ++		goto cleanup;
    ++	}
     +
    -+		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
    -+					     ref_content.buf, &oid, &referent,
    -+					     &type, &trailing, &failure_errno)) {
    ++	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
    ++				     ref_content.buf, &oid, &referent,
    ++				     &type, &trailing, &failure_errno)) {
    ++		ret = fsck_report_ref(o, &report,
    ++				      FSCK_MSG_BAD_REF_CONTENT,
    ++				      "invalid ref content");
    ++		goto cleanup;
    ++	}
    ++
    ++	if (!(type & REF_ISSYMREF)) {
    ++		if (*trailing == '\0') {
     +			ret = fsck_report_ref(o, &report,
    -+					      FSCK_MSG_BAD_REF_CONTENT,
    -+					      "invalid ref content");
    ++					      FSCK_MSG_REF_MISSING_NEWLINE,
    ++					      "missing newline");
     +			goto cleanup;
     +		}
     +
    -+		if (!(type & REF_ISSYMREF)) {
    -+			if (*trailing == '\0') {
    -+				ret = fsck_report_ref(o, &report,
    -+						      FSCK_MSG_REF_MISSING_NEWLINE,
    -+						      "missing newline");
    -+				goto cleanup;
    -+			}
    -+
    -+			if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
    -+				ret = fsck_report_ref(o, &report,
    -+						      FSCK_MSG_TRAILING_REF_CONTENT,
    -+						      "trailing garbage in ref");
    -+				goto cleanup;
    -+			}
    ++		if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
    ++			ret = fsck_report_ref(o, &report,
    ++					      FSCK_MSG_TRAILING_REF_CONTENT,
    ++					      "trailing garbage in ref");
    ++			goto cleanup;
     +		}
    -+		goto cleanup;
     +	}
     +
     +cleanup:
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should be adapted
      	test_must_be_empty err
      '
      
    -+test_expect_success 'regular ref content should be checked' '
    ++test_expect_success 'regular ref content should be checked (individual)' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	branch_dir_prefix=.git/refs/heads &&
     +	tag_dir_prefix=.git/refs/tags &&
     +	cd repo &&
    -+	git commit --allow-empty -m initial &&
    -+	git checkout -b branch-1 &&
    -+	git tag tag-1 &&
    -+	git commit --allow-empty -m second &&
    -+	git checkout -b branch-2 &&
    -+	git tag tag-2 &&
    -+	git checkout -b a/b/tag-2 &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
     +
    -+	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
    ++	git refs verify 2>err &&
    ++	test_must_be_empty err &&
    ++
    ++	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
    ++	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
     +	EOF
    -+	rm $branch_dir_prefix/branch-1-no-newline &&
    ++	rm $branch_dir_prefix/branch-no-newline &&
     +	test_cmp expect err &&
     +
    -+	printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
    ++	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-1-garbage: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $branch_dir_prefix/branch-1-garbage &&
    ++	rm $branch_dir_prefix/branch-garbage &&
     +	test_cmp expect err &&
     +
    -+	printf "%s\n\n\n" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
    ++	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $tag_dir_prefix/tag-1-garbage &&
    ++	rm $tag_dir_prefix/tag-garbage-1 &&
     +	test_cmp expect err &&
     +
    -+	printf "%s\n\n\n  garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
    ++	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $tag_dir_prefix/tag-1-garbage &&
    ++	rm $tag_dir_prefix/tag-garbage-2 &&
     +	test_cmp expect err &&
     +
    -+	printf "%s    garbage\n\na" "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
    ++	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-2-garbage: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $tag_dir_prefix/tag-2-garbage &&
    ++	rm $tag_dir_prefix/tag-garbage-3 &&
     +	test_cmp expect err &&
     +
    -+	printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
    ++	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
     +	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
    ++	error: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $tag_dir_prefix/tag-1-garbage &&
    ++	rm $tag_dir_prefix/tag-garbage-4 &&
     +	test_cmp expect err &&
     +
    -+	printf "%sx" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-bad &&
    ++	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-1-bad: badRefContent: invalid ref content
    ++	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
     +	EOF
    -+	rm $tag_dir_prefix/tag-1-bad &&
    ++	rm $tag_dir_prefix/tag-bad-1 &&
     +	test_cmp expect err &&
     +
    -+	printf "xfsazqfxcadas" > $tag_dir_prefix/tag-2-bad &&
    ++	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-2-bad: badRefContent: invalid ref content
    ++	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
     +	EOF
    -+	rm $tag_dir_prefix/tag-2-bad &&
    ++	rm $tag_dir_prefix/tag-bad-2 &&
     +	test_cmp expect err &&
     +
    -+	printf "xfsazqfxcadas" > $branch_dir_prefix/a/b/branch-2-bad &&
    ++	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/a/b/branch-2-bad: badRefContent: invalid ref content
    ++	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
     +	EOF
    -+	rm $branch_dir_prefix/a/b/branch-2-bad &&
    ++	rm $branch_dir_prefix/a/b/branch-bad &&
     +	test_cmp expect err
     +'
    ++
    ++test_expect_success 'regular ref content should be checked (aggregate)' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	branch_dir_prefix=.git/refs/heads &&
    ++	tag_dir_prefix=.git/refs/tags &&
    ++	cd repo &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
    ++
    ++	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
    ++	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
    ++	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
    ++	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
    ++	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
    ++	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
    ++	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
    ++	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
    ++	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
    ++
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
    ++	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
    ++	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
    ++	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
    ++	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
    ++	warning: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
    ++	EOF
    ++	sort err >sorted_err &&
    ++	test_cmp expect sorted_err
    ++'
     +
      test_done
3:  76dcf6bf58 ! 3:  0691e2960d ref: add symbolic ref content check for files backend
    @@ Metadata
     Author: shejialuo <shejialuo@gmail.com>
     
      ## Commit message ##
    -    ref: add symbolic ref content check for files backend
    +    ref: add symref content check for files backend
     
         We have already introduced the checks for regular refs. There is no need
    -    to check the consistency of the target which the symbolic ref points to.
    -    Instead, we just check the content of the symbolic ref itself.
    +    to check the consistency of the target which the symref points to.
    +    Instead, we just need to check the content of teh symref itself.
     
    -    In order to check the content of the symbolic ref, create a function
    -    "files_fsck_symref_target". It will first check whether the "pointee" is
    -    under the "refs/" directory and then we will check the "pointee" itself.
    +    In order to check the content of the symref, create a function
    +    "files_fsck_symref_target". It will first check whether the "referent"
    +    is under the "refs/" directory and then we will check the symref
    +    contents.
     
    -    There is no specification about the content of the symbolic ref.
    -    Although we do write "ref: %s\n" to create a symbolic ref by using
    -    "git-symbolic-ref(1)" command. However, this is not mandatory. We still
    -    accept symbolic refs with null trailing garbage. Put it more specific,
    -    the following are correct:
    +    A regular file is accepted as a textual symref if it begins with
    +    "ref:", followed by zero or more whitespaces, followed by the full
    +    refname, followed only by whitespace characters. We always write
    +    a single SP after "ref:" and a single LF after the refname, but
    +    third-party reimplementations of Git may have taken advantage of the
    +    looser syntax. Put it more specific, we accept the following contents
    +    of the symref:
     
         1. "ref: refs/heads/master   "
         2. "ref: refs/heads/master   \n  \n"
         3. "ref: refs/heads/master\n\n"
     
    -    But we do not allow any non-null trailing garbage. The following are bad
    -    symbolic contents which will be reported as fsck error by "git-fsck(1)".
    +    But we do not allow any other trailing garbage. The followings are bad
    +    symref contents which will be reported as fsck error by "git-fsck(1)".
     
         1. "ref: refs/heads/master garbage\n"
         2. "ref: refs/heads/master \n\n\n garbage  "
     
    -    In order to provide above checks, we will use "strrchr" to check whether
    -    we have newline in the ref content. Then we will check the name of the
    -    "pointee" is correct by using "check_refname_format". If the function
    -    fails, we need to trim the "pointee" to see whether the null-garbage
    -    causes the function fails. If so, we need to report that there is
    -    null-garbage in the symref content. Otherwise, we should report the user
    -    the "pointee" is bad.
    +    In order to provide above checks, we will first check whether the symref
    +    content misses the newline by peeking the last byte of the "referent" to
    +    see whether it is '\n'.
    +
    +    And we will remember the untrimmed length of the "referent" and call
    +    "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
    +    to chceck whether the trimmed referent format is valid. If not, we will
    +    report to the user that the symref points to referent which has invalid
    +    format. If it is valid, we will compare the untrimmed length and trimmed
    +    length, if they are not the same, we need to warn the user there is some
    +    trailing garbage in the symref content.
    +
    +    At last, we need to check whether the referent is the directory. We
    +    cannot distinguish whether the "refs/heads/a" is a directory or not by
    +    using "check_refname_format". We have already checked bad file type when
    +    iterating the "refs/" directory but we ignore the directory. Thus, we
    +    need to explicitly add check here.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ Documentation/fsck-msgids.txt
      `badRefName`::
      	(ERROR) A ref has an invalid format.
      
    -+`badSymrefPointee`::
    -+	(ERROR) The pointee of a symref is bad.
    ++`badSymrefTarget`::
    ++	(ERROR) The symref target points outside the ref directory or
    ++	the name of the symref target is invalid.
     +
      `badTagName`::
      	(INFO) A tag has an invalid format.
    @@ fsck.h: enum fsck_msg_type {
      	FUNC(BAD_REF_CONTENT, ERROR) \
      	FUNC(BAD_REF_FILETYPE, ERROR) \
      	FUNC(BAD_REF_NAME, ERROR) \
    -+	FUNC(BAD_SYMREF_POINTEE, ERROR) \
    ++	FUNC(BAD_SYMREF_TARGET, ERROR) \
      	FUNC(BAD_TIMEZONE, ERROR) \
      	FUNC(BAD_TREE, ERROR) \
      	FUNC(BAD_TREE_SHA1, ERROR) \
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
      				  struct dir_iterator *iter);
      
     +/*
    -+ * Check the symref "pointee_name" and "pointee_path". The caller should
    -+ * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
    -+ * would be the content after "refs:".
    ++ * Check the symref "referent" and "referent_path". For textual symref,
    ++ * "referent" would be the content after "refs:".
     + */
     +static int files_fsck_symref_target(struct fsck_options *o,
     +				    struct fsck_ref_report *report,
    -+				    const char *refname,
    -+				    struct strbuf *pointee_name,
    -+				    struct strbuf *pointee_path)
    ++				    struct strbuf *referent,
    ++				    struct strbuf *referent_path)
     +{
    -+	const char *newline_pos = NULL;
    ++	size_t len = referent->len - 1;
     +	const char *p = NULL;
     +	struct stat st;
     +	int ret = 0;
     +
    -+	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
    ++	if (!skip_prefix(referent->buf, "refs/", &p)) {
     +
     +		ret = fsck_report_ref(o, report,
    -+				      FSCK_MSG_BAD_SYMREF_POINTEE,
    ++				      FSCK_MSG_BAD_SYMREF_TARGET,
     +				      "points to ref outside the refs directory");
     +		goto out;
     +	}
     +
    -+	newline_pos = strrchr(p, '\n');
    -+	if (!newline_pos || *(newline_pos + 1)) {
    ++	if (referent->buf[referent->len - 1] != '\n') {
     +		ret = fsck_report_ref(o, report,
     +				      FSCK_MSG_REF_MISSING_NEWLINE,
     +				      "missing newline");
    ++		len++;
     +	}
     +
    -+	if (check_refname_format(pointee_name->buf, 0)) {
    -+		/*
    -+		 * When containing null-garbage, "check_refname_format" will
    -+		 * fail, we should trim the "pointee" to check again.
    -+		 */
    -+		strbuf_rtrim(pointee_name);
    -+		if (!check_refname_format(pointee_name->buf, 0)) {
    -+			ret = fsck_report_ref(o, report,
    -+					      FSCK_MSG_TRAILING_REF_CONTENT,
    -+					      "trailing null-garbage");
    -+			goto out;
    -+		}
    -+
    ++	strbuf_rtrim(referent);
    ++	if (check_refname_format(referent->buf, 0)) {
     +		ret = fsck_report_ref(o, report,
    -+				      FSCK_MSG_BAD_SYMREF_POINTEE,
    ++				      FSCK_MSG_BAD_SYMREF_TARGET,
     +				      "points to refname with invalid format");
    ++		goto out;
    ++	}
    ++
    ++	if (len != referent->len) {
    ++		ret = fsck_report_ref(o, report,
    ++				      FSCK_MSG_TRAILING_REF_CONTENT,
    ++				      "trailing garbage in ref");
     +	}
     +
     +	/*
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +	 * ref that does not exist yet. If the target ref does not exist, just
     +	 * skip the check for the file type.
     +	 */
    -+	if (lstat(pointee_path->buf, &st) < 0)
    ++	if (lstat(referent_path->buf, &st))
     +		goto out;
     +
    -+	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
    ++	/*
    ++	 * We cannot distinguish whether "refs/heads/a" is directory or nots by
    ++	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
    ++	 * check the file type of the target.
    ++	 */
    ++	if (S_ISDIR(st.st_mode)) {
     +		ret = fsck_report_ref(o, report,
    -+				      FSCK_MSG_BAD_SYMREF_POINTEE,
    -+				      "points to an invalid file type");
    ++				      FSCK_MSG_BAD_SYMREF_TARGET,
    ++				      "points to the directory");
     +		goto out;
     +	}
     +
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
      				   const char *refs_check_dir,
      				   struct dir_iterator *iter)
      {
    -+	struct strbuf pointee_path = STRBUF_INIT;
    ++	struct strbuf referent_path = STRBUF_INIT;
      	struct strbuf ref_content = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
      	struct strbuf refname = STRBUF_INIT;
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    - 						      "trailing garbage in ref");
    - 				goto cleanup;
    - 			}
    -+		} else {
    -+			strbuf_addf(&pointee_path, "%s/%s",
    -+				    ref_store->gitdir, referent.buf);
    -+			ret = files_fsck_symref_target(o, &report, refname.buf,
    -+						       &referent,
    -+						       &pointee_path);
    + 					      "trailing garbage in ref");
    + 			goto cleanup;
      		}
    - 		goto cleanup;
    ++	} else {
    ++		strbuf_addf(&referent_path, "%s/%s",
    ++			    ref_store->gitdir, referent.buf);
    ++		/*
    ++		 * the referent may contain the spaces and the newline, need to
    ++		 * trim for path.
    ++		 */
    ++		strbuf_rtrim(&referent_path);
    ++		ret = files_fsck_symref_target(o, &report,
    ++					       &referent,
    ++					       &referent_path);
      	}
    -@@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    + 
    + cleanup:
      	strbuf_release(&refname);
      	strbuf_release(&ref_content);
      	strbuf_release(&referent);
    -+	strbuf_release(&pointee_path);
    ++	strbuf_release(&referent_path);
      	return ret;
      }
      
     
      ## t/t0602-reffiles-fsck.sh ##
    -@@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be checked' '
    - 	test_cmp expect err
    +@@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be checked (aggregate)' '
    + 	test_cmp expect sorted_err
      '
      
    -+test_expect_success 'symbolic ref content should be checked' '
    ++test_expect_success 'textual symref content should be checked (individual)' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	branch_dir_prefix=.git/refs/heads &&
     +	tag_dir_prefix=.git/refs/tags &&
     +	cd repo &&
    -+	git commit --allow-empty -m initial &&
    -+	git checkout -b branch-1 &&
    -+	git tag tag-1 &&
    -+	git checkout -b a/b/branch-2 &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
    ++
    ++	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
    ++	git refs verify 2>err &&
    ++	rm $branch_dir_prefix/branch-good &&
    ++	test_must_be_empty err &&
    ++
    ++	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
    ++	EOF
    ++	rm $branch_dir_prefix/branch-no-newline-1 &&
    ++	test_cmp expect err &&
     +
    -+	printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline &&
    ++	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
    ++	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
    ++	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $branch_dir_prefix/branch-1-no-newline &&
    ++	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
     +	test_cmp expect err &&
     +
    -+	printf "ref: refs/heads/branch     " > $branch_dir_prefix/a/b/branch-trailing &&
    ++	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
    -+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
    ++	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $branch_dir_prefix/a/b/branch-trailing &&
    ++	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
     +	test_cmp expect err &&
     +
    -+	printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing &&
    ++	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
    ++	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $branch_dir_prefix/a/b/branch-trailing &&
    ++	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
     +	test_cmp expect err &&
     +
    -+	printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing &&
    ++	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline
    -+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
    ++	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
    ++	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
     +	EOF
    -+	rm $branch_dir_prefix/a/b/branch-trailing &&
    ++	rm $branch_dir_prefix/a/b/branch-complicated &&
     +	test_cmp expect err &&
     +
    -+	printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad &&
    ++	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format
    ++	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
     +	EOF
    -+	rm $branch_dir_prefix/branch-2-bad &&
    ++	rm $branch_dir_prefix/branch-bad-1 &&
    ++	test_cmp expect err &&
    ++
    ++	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
    ++	EOF
    ++	rm $branch_dir_prefix/branch-bad-2 &&
    ++	test_cmp expect err &&
    ++
    ++	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
    ++	EOF
    ++	rm $branch_dir_prefix/branch-bad-3 &&
     +	test_cmp expect err
     +'
    ++
    ++test_expect_success 'textual symref content should be checked (aggregate)' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	branch_dir_prefix=.git/refs/heads &&
    ++	tag_dir_prefix=.git/refs/tags &&
    ++	cd repo &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
    ++
    ++	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
    ++	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
    ++	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
    ++	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
    ++	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
    ++	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
    ++	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
    ++	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
    ++	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
    ++
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
    ++	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
    ++	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
    ++	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
    ++	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
    ++	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
    ++	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
    ++	EOF
    ++	sort err >sorted_err &&
    ++	test_cmp expect sorted_err
    ++'
     +
      test_done
4:  2008f8635c ! 4:  4105bfa1e3 ref: add symlink ref check for files backend
    @@ Metadata
     Author: shejialuo <shejialuo@gmail.com>
     
      ## Commit message ##
    -    ref: add symlink ref check for files backend
    +    ref: add symlink ref content check for files backend
     
         We have already introduced "files_fsck_symref_target". We should reuse
    -    this function to handle the symrefs which are legacy symbolic links. We
    -    should not check the trailing garbage for symbolic links. Add a new
    +    this function to handle the symrefs which use legacy symbolic links. We
    +    should not check the trailing garbage for symbolic refs. Add a new
         parameter "symbolic_link" to disable some checks which should only be
    -    used for symbolic ref.
    +    executed for textual symrefs.
     
    -    We firstly use the "strbuf_add_real_path" to resolve the symlinks and
    -    get the absolute path "pointee_path" which the symlink ref points to.
    -    Then we can get the absolute path "abs_gitdir" of the "gitdir". By
    -    combining "pointee_path" and "abs_gitdir", we can extract the
    +    We firstly use the "strbuf_add_real_path" to resolve the symlink and
    +    get the absolute path "referent_path" which the symlink ref points
    +    to. Then we can get the absolute path "abs_gitdir" of the "gitdir".
    +    By combining "referent_path" and "abs_gitdir", we can extract the
         "referent". Thus, we can reuse "files_fsck_symref_target" function to
         seamlessly check the symlink refs.
     
    +    Because we are going to drop support for "core.prefersymlinkrefs", add a
    +    new fsck message "symlinkRef" to let the user be aware of this
    +    information.
    +
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
         Signed-off-by: shejialuo <shejialuo@gmail.com>
     
    + ## Documentation/fsck-msgids.txt ##
    +@@
    + 	(INFO) A ref does not end with newline. This kind of ref may
    + 	be considered ERROR in the future.
    + 
    ++`symlinkRef`::
    ++	(INFO) A symref uses the symbolic link. This kind of symref may
    ++	be considered ERROR in the future when totally dropping the
    ++	symlink support.
    ++
    + `trailingRefContent`::
    + 	(INFO) A ref has trailing contents. This kind of ref may be
    + 	considered ERROR in the future.
    +
    + ## fsck.h ##
    +@@ fsck.h: enum fsck_msg_type {
    + 	FUNC(BAD_TAG_NAME, INFO) \
    + 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
    + 	FUNC(REF_MISSING_NEWLINE, INFO) \
    ++	FUNC(SYMLINK_REF, INFO) \
    + 	FUNC(TRAILING_REF_CONTENT, INFO) \
    + 	/* ignored (elevated when requested) */ \
    + 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
    +
      ## refs/files-backend.c ##
     @@
      #include "../git-compat-util.h"
    @@ refs/files-backend.c: static int lock_ref_for_update(struct files_ref_store *ref
      			goto out;
      		}
     @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
    + 
      /*
    -  * Check the symref "pointee_name" and "pointee_path". The caller should
    -  * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
    -- * would be the content after "refs:".
    -+ * would be the content after "refs:". For symblic link, "pointee_name" would
    -+ * be the relative path agaignst "gitdir".
    +  * Check the symref "referent" and "referent_path". For textual symref,
    +- * "referent" would be the content after "refs:".
    ++ * "referent" would be the content after "refs:". For symlink ref,
    ++ * "referent" would be the relative path agaignst "gitdir" which should
    ++ * be the same as the textual symref literally.
       */
      static int files_fsck_symref_target(struct fsck_options *o,
      				    struct fsck_ref_report *report,
    --				    const char *refname,
    - 				    struct strbuf *pointee_name,
    --				    struct strbuf *pointee_path)
    -+				    struct strbuf *pointee_path,
    + 				    struct strbuf *referent,
    +-				    struct strbuf *referent_path)
    ++				    struct strbuf *referent_path,
     +				    unsigned int symbolic_link)
      {
    - 	const char *newline_pos = NULL;
    + 	size_t len = referent->len - 1;
      	const char *p = NULL;
     @@ refs/files-backend.c: static int files_fsck_symref_target(struct fsck_options *o,
      		goto out;
      	}
      
    --	newline_pos = strrchr(p, '\n');
    --	if (!newline_pos || *(newline_pos + 1)) {
    --		ret = fsck_report_ref(o, report,
    --				      FSCK_MSG_REF_MISSING_NEWLINE,
    --				      "missing newline");
    -+	if (!symbolic_link) {
    -+		newline_pos = strrchr(p, '\n');
    -+		if (!newline_pos || *(newline_pos + 1)) {
    -+			ret = fsck_report_ref(o, report,
    -+					      FSCK_MSG_REF_MISSING_NEWLINE,
    -+					      "missing newline");
    -+		}
    +-	if (referent->buf[referent->len - 1] != '\n') {
    ++	if (!symbolic_link && referent->buf[referent->len - 1] != '\n') {
    + 		ret = fsck_report_ref(o, report,
    + 				      FSCK_MSG_REF_MISSING_NEWLINE,
    + 				      "missing newline");
    + 		len++;
      	}
      
    - 	if (check_refname_format(pointee_name->buf, 0)) {
    --		/*
    --		 * When containing null-garbage, "check_refname_format" will
    --		 * fail, we should trim the "pointee" to check again.
    --		 */
    --		strbuf_rtrim(pointee_name);
    --		if (!check_refname_format(pointee_name->buf, 0)) {
    --			ret = fsck_report_ref(o, report,
    --					      FSCK_MSG_TRAILING_REF_CONTENT,
    --					      "trailing null-garbage");
    --			goto out;
    -+		if (!symbolic_link) {
    -+			/*
    -+			* When containing null-garbage, "check_refname_format" will
    -+			* fail, we should trim the "pointee" to check again.
    -+			*/
    -+			strbuf_rtrim(pointee_name);
    -+			if (!check_refname_format(pointee_name->buf, 0)) {
    -+				ret = fsck_report_ref(o, report,
    -+						      FSCK_MSG_TRAILING_REF_CONTENT,
    -+						      "trailing null-garbage");
    -+				goto out;
    -+			}
    - 		}
    +-	strbuf_rtrim(referent);
    ++	if (!symbolic_link)
    ++		strbuf_rtrim(referent);
    ++
    + 	if (check_refname_format(referent->buf, 0)) {
    + 		ret = fsck_report_ref(o, report,
    + 				      FSCK_MSG_BAD_SYMREF_TARGET,
    +@@ refs/files-backend.c: static int files_fsck_symref_target(struct fsck_options *o,
    + 		goto out;
    + 	}
      
    +-	if (len != referent->len) {
    ++	if (!symbolic_link && len != referent->len) {
      		ret = fsck_report_ref(o, report,
    + 				      FSCK_MSG_TRAILING_REF_CONTENT,
    + 				      "trailing garbage in ref");
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
      {
    - 	struct strbuf pointee_path = STRBUF_INIT;
    + 	struct strbuf referent_path = STRBUF_INIT;
      	struct strbuf ref_content = STRBUF_INIT;
     +	struct strbuf abs_gitdir = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
      	struct strbuf refname = STRBUF_INIT;
      	struct fsck_ref_report report = {0};
    -+	const char *pointee_name = NULL;
     +	unsigned int symbolic_link = 0;
      	const char *trailing = NULL;
      	unsigned int type = 0;
      	int failure_errno = 0;
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    - 		} else {
    - 			strbuf_addf(&pointee_path, "%s/%s",
    - 				    ref_store->gitdir, referent.buf);
    --			ret = files_fsck_symref_target(o, &report, refname.buf,
    -+			ret = files_fsck_symref_target(o, &report,
    - 						       &referent,
    --						       &pointee_path);
    -+						       &pointee_path,
    -+						       symbolic_link);
    - 		}
    - 		goto cleanup;
    - 	}
    + 	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
    + 	report.path = refname.buf;
      
    -+	symbolic_link = 1;
    -+
    -+	strbuf_add_real_path(&pointee_path, iter->path.buf);
    -+	strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
    -+	strbuf_normalize_path(&abs_gitdir);
    -+	if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
    -+		strbuf_addch(&abs_gitdir, '/');
    +-	if (S_ISLNK(iter->st.st_mode))
    ++	if (S_ISLNK(iter->st.st_mode)) {
    ++		const char* relative_referent_path;
     +
    -+	if (!skip_prefix(pointee_path.buf, abs_gitdir.buf, &pointee_name)) {
    ++		symbolic_link = 1;
     +		ret = fsck_report_ref(o, &report,
    -+				      FSCK_MSG_BAD_SYMREF_POINTEE,
    -+				      "point to target outside gitdir");
    -+		goto cleanup;
    -+	}
    ++				      FSCK_MSG_SYMLINK_REF,
    ++				      "use deprecated symbolic link for symref");
     +
    -+	strbuf_addstr(&referent, pointee_name);
    -+	ret = files_fsck_symref_target(o, &report,
    -+				       &referent, &pointee_path,
    -+				       symbolic_link);
    ++		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
    ++		strbuf_normalize_path(&abs_gitdir);
    ++		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
    ++			strbuf_addch(&abs_gitdir, '/');
     +
    ++		strbuf_add_real_path(&referent_path, iter->path.buf);
    ++
    ++		if (!skip_prefix(referent_path.buf,
    ++				 abs_gitdir.buf,
    ++				 &relative_referent_path)) {
    ++			ret = fsck_report_ref(o, &report,
    ++					      FSCK_MSG_BAD_SYMREF_TARGET,
    ++					      "point to target outside gitdir");
    ++			goto cleanup;
    ++		}
    ++
    ++		strbuf_addstr(&referent, relative_referent_path);
    ++		ret = files_fsck_symref_target(o, &report,
    ++					       &referent, &referent_path,
    ++					       symbolic_link);
    ++
    + 		goto cleanup;
    ++	}
    + 
    + 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
    + 		ret = error_errno(_("%s/%s: unable to read the ref"),
    +@@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
    + 		strbuf_rtrim(&referent_path);
    + 		ret = files_fsck_symref_target(o, &report,
    + 					       &referent,
    +-					       &referent_path);
    ++					       &referent_path,
    ++					       symbolic_link);
    + 	}
    + 
      cleanup:
    - 	strbuf_release(&refname);
    +@@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
      	strbuf_release(&ref_content);
      	strbuf_release(&referent);
    - 	strbuf_release(&pointee_path);
    + 	strbuf_release(&referent_path);
     +	strbuf_release(&abs_gitdir);
      	return ret;
      }
      
     
      ## t/t0602-reffiles-fsck.sh ##
    -@@ t/t0602-reffiles-fsck.sh: test_expect_success 'symbolic ref content should be checked' '
    - 	test_cmp expect err
    +@@ t/t0602-reffiles-fsck.sh: test_expect_success 'textual symref content should be checked (aggregate)' '
    + 	test_cmp expect sorted_err
      '
      
    -+test_expect_success SYMLINKS 'symbolic ref (symbolic link) content should be checked' '
    ++test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	branch_dir_prefix=.git/refs/heads &&
     +	tag_dir_prefix=.git/refs/tags &&
     +	cd repo &&
    -+	git commit --allow-empty -m initial &&
    -+	git checkout -b branch-1 &&
    -+	git tag tag-1 &&
    -+	git checkout -b a/b/branch-2 &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
    ++
    ++	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
    ++	EOF
    ++	rm $branch_dir_prefix/branch-symbolic-good &&
    ++	test_cmp expect err &&
    ++
    ++	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
    ++	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
    ++	EOF
    ++	rm $branch_dir_prefix/branch-symbolic-1 &&
    ++	test_cmp expect err &&
     +
    -+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic &&
    ++	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-symbolic: badSymrefPointee: point to target outside gitdir
    ++	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
    ++	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
     +	EOF
    -+	rm $branch_dir_prefix/branch-symbolic &&
    ++	rm $branch_dir_prefix/branch-symbolic-2 &&
     +	test_cmp expect err &&
     +
    -+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic &&
    ++	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-symbolic: badSymrefPointee: points to ref outside the refs directory
    ++	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
    ++	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
     +	EOF
    -+	rm $branch_dir_prefix/branch-symbolic &&
    ++	rm $branch_dir_prefix/branch-symbolic-3 &&
     +	test_cmp expect err &&
     +
    -+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic &&
    ++	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
    ++	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
    ++	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
     +	EOF
    -+	rm $branch_dir_prefix/branch-symbolic &&
    ++	rm $tag_dir_prefix/tag-symbolic-1 &&
     +	test_cmp expect err &&
     +
    -+	ln -sf ./".branch" $branch_dir_prefix/branch-symbolic &&
    ++	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
     +	test_must_fail git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-symbolic: badSymrefPointee: points to refname with invalid format
    ++	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
    ++	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
     +	EOF
    -+	rm $branch_dir_prefix/branch-symbolic &&
    ++	rm $tag_dir_prefix/tag-symbolic-2 &&
     +	test_cmp expect err
     +'
    ++
    ++test_expect_success SYMLINKS 'symlink symref content should be checked (aggregate)' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	branch_dir_prefix=.git/refs/heads &&
    ++	tag_dir_prefix=.git/refs/tags &&
    ++	cd repo &&
    ++	test_commit default &&
    ++	mkdir -p "$branch_dir_prefix/a/b" &&
    ++
    ++	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
    ++	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
    ++	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
    ++	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
    ++	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
    ++	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
    ++
    ++	test_must_fail git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
    ++	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
    ++	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
    ++	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
    ++	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
    ++	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
    ++	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
    ++	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
    ++	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
    ++	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
    ++	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
    ++	EOF
    ++	sort err >sorted_err &&
    ++	test_cmp expect sorted_err
    ++'
     +
      test_done
-- 
2.46.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
@ 2024-09-03 12:20       ` shejialuo
  2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-03 12:20 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use " {0} " to express
that we are 0-initializing everything. In order to be align with the the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8d6ec9458d..890d0324e1 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v3 2/4] ref: add regular ref content check for files backend
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
  2024-09-03 12:20       ` [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-09-03 12:20       ` shejialuo
  2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-10 16:07         ` karthik nayak
  2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-09-03 12:20 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, when parsing the regular refs for files backend by using
"files-backend.c::parse_loose_ref_contents", we allow the ref content to
end with no newline or to contain some garbages.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. But let's notice such a "curiously formatted"
loose refs being valid and tell the user our findings, so we can access
the possible extent of damage when we tighten the parsing rules in the
future.

And it's not suitable to either report a warn fsck message to the user.
This is because if the caller set the "strict" field in "fsck_options",
fsck warns will be automatically upgraded to errors. We should not allow
user to specify the "--strict" flag to upgrade the fsck warnings to
errors at current. It might cause compatibility issue which may break
the legacy repository. So we add the following two fsck infos to
represent the situation where the ref content ends without newline or has
garbages:

1. "refMissingNewline(INFO)": A ref does not end with newline. This kind
   of ref may be considered ERROR in the future.
2. "trailingRefContent(INFO)": A ref has trailing contents. This kind of
   ref may be considered ERROR in the future.

It may seem that we could not give the user any warnings by creating
fsck infos. However, in "fsck.c::fsck_vreport", we will convert
"FSCK_INFO" to "FSCK_WARN" and we can still warn the user about these
situations when using "git-refs verify" without introducing
compatibility issue.

In current "git-fsck(1)", it will report an error when the ref content
is bad, so we should following this to report an error to the user when
"parse_loose_ref_contents" fails. And we add a new fsck error message
called "badRefContent(ERROR)" to represent that a ref has a bad content.

In order to tell whether the ref has trailing content, add a new
parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
function "files_fsck_refs_content" to check the regular refs to enhance
the "git-refs verify".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  11 ++++
 fsck.h                        |   3 +
 refs.c                        |   2 +-
 refs/files-backend.c          |  68 ++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 120 ++++++++++++++++++++++++++++++++++
 6 files changed, 202 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..06d045ac48 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has a bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
@@ -170,6 +173,14 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A ref does not end with newline. This kind of ref may
+	be considered ERROR in the future.
+
+`trailingRefContent`::
+	(INFO) A ref has trailing contents. This kind of ref may be
+	considered ERROR in the future.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
@@ -84,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 890d0324e1..0187b85c5f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	const char *trailing = NULL;
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = error_errno(_("%s/%s: unable to read the ref"),
+				  refs_check_dir, iter->relative_path);
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &trailing, &failure_errno)) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "invalid ref content");
+		goto cleanup;
+	}
+
+	if (!(type & REF_ISSYMREF)) {
+		if (*trailing == '\0') {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "missing newline");
+			goto cleanup;
+		}
+
+		if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "trailing garbage in ref");
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3575,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..a06ad044f2 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,124 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-1 &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-2 &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-3 &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-4 &&
+	test_cmp expect err &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v3 3/4] ref: add symref content check for files backend
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
  2024-09-03 12:20       ` [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-09-03 12:20       ` shejialuo
  2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-10 22:19         ` karthik nayak
  2024-09-03 12:21       ` [PATCH v3 4/4] ref: add symlink ref " shejialuo
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
  4 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-09-03 12:20 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symref points to.
Instead, we just need to check the content of teh symref itself.

In order to check the content of the symref, create a function
"files_fsck_symref_target". It will first check whether the "referent"
is under the "refs/" directory and then we will check the symref
contents.

A regular file is accepted as a textual symref if it begins with
"ref:", followed by zero or more whitespaces, followed by the full
refname, followed only by whitespace characters. We always write
a single SP after "ref:" and a single LF after the refname, but
third-party reimplementations of Git may have taken advantage of the
looser syntax. Put it more specific, we accept the following contents
of the symref:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

In order to provide above checks, we will first check whether the symref
content misses the newline by peeking the last byte of the "referent" to
see whether it is '\n'.

And we will remember the untrimmed length of the "referent" and call
"strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
to chceck whether the trimmed referent format is valid. If not, we will
report to the user that the symref points to referent which has invalid
format. If it is valid, we will compare the untrimmed length and trimmed
length, if they are not the same, we need to warn the user there is some
trailing garbage in the symref content.

At last, we need to check whether the referent is the directory. We
cannot distinguish whether the "refs/heads/a" is a directory or not by
using "check_refname_format". We have already checked bad file type when
iterating the "refs/" directory but we ignore the directory. Thus, we
need to explicitly add check here.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   4 ++
 fsck.h                        |   1 +
 refs/files-backend.c          |  81 +++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 117 ++++++++++++++++++++++++++++++++++
 4 files changed, 203 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 06d045ac48..beb6c4e49e 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,10 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badSymrefTarget`::
+	(ERROR) The symref target points outside the ref directory or
+	the name of the symref target is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..5ea874916d 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_SYMREF_TARGET, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0187b85c5f..fef32e607f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,11 +3434,80 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "referent" and "referent_path". For textual symref,
+ * "referent" would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent,
+				    struct strbuf *referent_path)
+{
+	size_t len = referent->len - 1;
+	const char *p = NULL;
+	struct stat st;
+	int ret = 0;
+
+	if (!skip_prefix(referent->buf, "refs/", &p)) {
+
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	if (referent->buf[referent->len - 1] != '\n') {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+		len++;
+	}
+
+	strbuf_rtrim(referent);
+	if (check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      "points to refname with invalid format");
+		goto out;
+	}
+
+	if (len != referent->len) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "trailing garbage in ref");
+	}
+
+	/*
+	 * Missing target should not be treated as any error worthy event and
+	 * not even warn. It is a common case that a symbolic ref points to a
+	 * ref that does not exist yet. If the target ref does not exist, just
+	 * skip the check for the file type.
+	 */
+	if (lstat(referent_path->buf, &st))
+		goto out;
+
+	/*
+	 * We cannot distinguish whether "refs/heads/a" is directory or nots by
+	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
+	 * check the file type of the target.
+	 */
+	if (S_ISDIR(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      "points to the directory");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
+	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "trailing garbage in ref");
 			goto cleanup;
 		}
+	} else {
+		strbuf_addf(&referent_path, "%s/%s",
+			    ref_store->gitdir, referent.buf);
+		/*
+		 * the referent may contain the spaces and the newline, need to
+		 * trim for path.
+		 */
+		strbuf_rtrim(&referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent,
+					       &referent_path);
 	}
 
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&referent_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a06ad044f2..e0bf8c8c8b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -209,4 +209,121 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err &&
+
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-3 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
+	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v3 4/4] ref: add symlink ref content check for files backend
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
                         ` (2 preceding siblings ...)
  2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
@ 2024-09-03 12:21       ` shejialuo
  2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
  4 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-03 12:21 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

We firstly use the "strbuf_add_real_path" to resolve the symlink and
get the absolute path "referent_path" which the symlink ref points
to. Then we can get the absolute path "abs_gitdir" of the "gitdir".
By combining "referent_path" and "abs_gitdir", we can extract the
"referent". Thus, we can reuse "files_fsck_symref_target" function to
seamlessly check the symlink refs.

Because we are going to drop support for "core.prefersymlinkrefs", add a
new fsck message "symlinkRef" to let the user be aware of this
information.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  5 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 68 +++++++++++++++++++-----
 t/t0602-reffiles-fsck.sh      | 97 +++++++++++++++++++++++++++++++++++
 4 files changed, 157 insertions(+), 14 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index beb6c4e49e..9e8e1ac7f0 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -181,6 +181,11 @@
 	(INFO) A ref does not end with newline. This kind of ref may
 	be considered ERROR in the future.
 
+`symlinkRef`::
+	(INFO) A symref uses the symbolic link. This kind of symref may
+	be considered ERROR in the future when totally dropping the
+	symlink support.
+
 `trailingRefContent`::
 	(INFO) A ref has trailing contents. This kind of ref may be
 	considered ERROR in the future.
diff --git a/fsck.h b/fsck.h
index 5ea874916d..1c6f750812 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fef32e607f..2a1b952f0d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../copy.h"
 #include "../environment.h"
 #include "../gettext.h"
@@ -1950,10 +1951,13 @@ static int commit_ref_update(struct files_ref_store *refs,
 	return 0;
 }
 
+#ifdef NO_SYMLINK_HEAD
+#define create_ref_symlink(a, b) (-1)
+#else
 static int create_ref_symlink(struct ref_lock *lock, const char *target)
 {
 	int ret = -1;
-#ifndef NO_SYMLINK_HEAD
+
 	char *ref_path = get_locked_file_path(&lock->lk);
 	unlink(ref_path);
 	ret = symlink(target, ref_path);
@@ -1961,13 +1965,12 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
 
 	if (ret)
 		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
-#endif
 	return ret;
 }
+#endif
 
-static int create_symref_lock(struct files_ref_store *refs,
-			      struct ref_lock *lock, const char *refname,
-			      const char *target, struct strbuf *err)
+static int create_symref_lock(struct ref_lock *lock, const char *target,
+			      struct strbuf *err)
 {
 	if (!fdopen_lock_file(&lock->lk, "w")) {
 		strbuf_addf(err, "unable to fdopen %s: %s",
@@ -2583,8 +2586,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	}
 
 	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
-		if (create_symref_lock(refs, lock, update->refname,
-				       update->new_target, err)) {
+		if (create_symref_lock(lock, update->new_target, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
 			goto out;
 		}
@@ -3436,12 +3438,15 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 /*
  * Check the symref "referent" and "referent_path". For textual symref,
- * "referent" would be the content after "refs:".
+ * "referent" would be the content after "refs:". For symlink ref,
+ * "referent" would be the relative path agaignst "gitdir" which should
+ * be the same as the textual symref literally.
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent,
-				    struct strbuf *referent_path)
+				    struct strbuf *referent_path,
+				    unsigned int symbolic_link)
 {
 	size_t len = referent->len - 1;
 	const char *p = NULL;
@@ -3456,14 +3461,16 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (referent->buf[referent->len - 1] != '\n') {
+	if (!symbolic_link && referent->buf[referent->len - 1] != '\n') {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_REF_MISSING_NEWLINE,
 				      "missing newline");
 		len++;
 	}
 
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
+
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_SYMREF_TARGET,
@@ -3471,7 +3478,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (len != referent->len) {
+	if (!symbolic_link && len != referent->len) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_TRAILING_REF_CONTENT,
 				      "trailing garbage in ref");
@@ -3509,9 +3516,11 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 {
 	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
+	unsigned int symbolic_link = 0;
 	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
@@ -3521,8 +3530,37 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
 	report.path = refname.buf;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path;
+
+		symbolic_link = 1;
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&referent_path, iter->path.buf);
+
+		if (!skip_prefix(referent_path.buf,
+				 abs_gitdir.buf,
+				 &relative_referent_path)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_BAD_SYMREF_TARGET,
+					      "point to target outside gitdir");
+			goto cleanup;
+		}
+
+		strbuf_addstr(&referent, relative_referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent, &referent_path,
+					       symbolic_link);
+
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = error_errno(_("%s/%s: unable to read the ref"),
@@ -3563,7 +3601,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		strbuf_rtrim(&referent_path);
 		ret = files_fsck_symref_target(o, &report,
 					       &referent,
-					       &referent_path);
+					       &referent_path,
+					       symbolic_link);
 	}
 
 cleanup:
@@ -3571,6 +3610,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&referent_path);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index e0bf8c8c8b..e735816d5b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -326,4 +326,101 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-2 &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-3 &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-2 &&
+	test_cmp expect err
+'
+
+test_expect_success SYMLINKS 'symlink symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
+	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
+	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 2/4] ref: add regular ref content check for files backend
  2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
@ 2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-10  7:42           ` shejialuo
  2024-09-10 16:07         ` karthik nayak
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-09-09 15:04 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Sep 03, 2024 at 08:20:46PM +0800, shejialuo wrote:
> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, when parsing the regular refs for files backend by using
> "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> end with no newline or to contain some garbages.
> 
> Even though we never create such loose refs ourselves, we have accepted
> such loose refs. So, it is entirely possible that some third-party tools
> may rely on such loose refs being valid. We should not report an error
> fsck message at current. But let's notice such a "curiously formatted"
> loose refs being valid and tell the user our findings, so we can access
> the possible extent of damage when we tighten the parsing rules in the
> future.
> 
> And it's not suitable to either report a warn fsck message to the user.

s/to either/either to

> This is because if the caller set the "strict" field in "fsck_options",
> fsck warns will be automatically upgraded to errors. We should not allow
> user to specify the "--strict" flag to upgrade the fsck warnings to
> errors at current.

This is formulated a bit curiously: it reads as if we wanted to limit
what the user can do, but what we really want to ensure is that the
`--strict` flag doesn't convert it into an error. So maybe something
like this instead of the second sentence:

    We don't (yet) want the "--strict" flag that controls this bit to
    end up generating errors for such weirdly-formatted reference
    contents, as we first want to assess whether this retroactive
    tightening will cause issues for any tools out there.

> It might cause compatibility issue which may break

s/issue/issues

> the legacy repository. So we add the following two fsck infos to

I wouldn't call it "legacy" just yet, as we didn't yet decide whether
we're going to make this formatting invalid in the first place. It's
rather a test balloon.

> represent the situation where the ref content ends without newline or has
> garbages:

s/garbages/trailing garbage

> 1. "refMissingNewline(INFO)": A ref does not end with newline. This kind
>    of ref may be considered ERROR in the future.
> 2. "trailingRefContent(INFO)": A ref has trailing contents. This kind of
>    ref may be considered ERROR in the future.

In both cases, "may be considered ERROR" -> "may be considered an
error". Also in the actual messages.

> It may seem that we could not give the user any warnings by creating
> fsck infos. However, in "fsck.c::fsck_vreport", we will convert
> "FSCK_INFO" to "FSCK_WARN" and we can still warn the user about these
> situations when using "git-refs verify" without introducing

s/"git-refs verify"/"git refs verify". We don't use dashed builtins
nowadays anymore.

> compatibility issue.

s/issue/issues

> In current "git-fsck(1)", it will report an error when the ref content
> is bad, so we should following this to report an error to the user when
> "parse_loose_ref_contents" fails. And we add a new fsck error message
> called "badRefContent(ERROR)" to represent that a ref has a bad content.

Okay, so this is basically porting over behaviour that git-fsck(1)
already has to `git refs verify` and should thus not cause new issues
anywhere. I think it would have made sense to do so in a first step and
then introduce the tightened rules in a separate commit.

Will we eventually remove those checks from git-fsck(1) when we adapt it
to call `git refs verify`? If so, we should likely note that in the
commit message.

> In order to tell whether the ref has trailing content, add a new
> parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
> function "files_fsck_refs_content" to check the regular refs to enhance
> the "git-refs verify".

This paragraph only re-explains what the diff already tells us, so it
can likely be removed.

> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |  11 ++++
>  fsck.h                        |   3 +
>  refs.c                        |   2 +-
>  refs/files-backend.c          |  68 ++++++++++++++++++-
>  refs/refs-internal.h          |   2 +-
>  t/t0602-reffiles-fsck.sh      | 120 ++++++++++++++++++++++++++++++++++
>  6 files changed, 202 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 68a2801f15..06d045ac48 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -19,6 +19,9 @@
>  `badParentSha1`::
>  	(ERROR) A commit object has a bad parent sha1.
>  
> +`badRefContent`::
> +	(ERROR) A ref has a bad content.
> +

s/a bad content/bad content

>  `badRefFiletype`::
>  	(ERROR) A ref has a bad file type.
>  
> @@ -170,6 +173,14 @@
>  `nullSha1`::
>  	(WARN) Tree contains entries pointing to a null sha1.
>  
> +`refMissingNewline`::
> +	(INFO) A ref does not end with newline. This kind of ref may
> +	be considered ERROR in the future.
> +

I'd reformulate the second sentence to "This will be considered an error
in the future". This indicates that we have the intent to tighten this
check to any user and would urge them to speak up in case they disagree
with such a tightening.

> +`trailingRefContent`::
> +	(INFO) A ref has trailing contents. This kind of ref may be
> +	considered ERROR in the future.

Same.

> @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = error_errno(_("%s/%s: unable to read the ref"),
> +				  refs_check_dir, iter->relative_path);

We typically have the name of things we read trailing and not leading in
error messages. So this should rather be "unable do read ref '%s/%s'".

> +		goto cleanup;
> +	}
> +
> +	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
> +				     ref_content.buf, &oid, &referent,
> +				     &type, &trailing, &failure_errno)) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "invalid ref content");
> +		goto cleanup;
> +	}
> +
> +	if (!(type & REF_ISSYMREF)) {

Coming back to my comment further up, I guess this whole block here
could be introduced in a separate commit. So the first commit introduces
the infra to check loose ref contents as an obvious step because we
simply port over rules that already exist in git-fsck(1). And the second
step could then do this retroactive tightening with the justification
you have spelt out in the commit message.

> +		if (*trailing == '\0') {

`if (!*trailing)`

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 3/4] ref: add symref content check for files backend
  2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
@ 2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-10  8:02           ` shejialuo
  2024-09-10 22:19         ` karthik nayak
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-09-09 15:04 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Sep 03, 2024 at 08:20:54PM +0800, shejialuo wrote:
> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symref points to.
> Instead, we just need to check the content of teh symref itself.

s/teh/the

> In order to check the content of the symref, create a function
> "files_fsck_symref_target". It will first check whether the "referent"
> is under the "refs/" directory and then we will check the symref
> contents.
> 
> A regular file is accepted as a textual symref if it begins with
> "ref:", followed by zero or more whitespaces, followed by the full
> refname, followed only by whitespace characters. We always write
> a single SP after "ref:" and a single LF after the refname, but
> third-party reimplementations of Git may have taken advantage of the
> looser syntax. Put it more specific, we accept the following contents
> of the symref:
> 
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
> 
> But we do not allow any other trailing garbage. The followings are bad
> symref contents which will be reported as fsck error by "git-fsck(1)".
> 
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
> 
> In order to provide above checks, we will first check whether the symref
> content misses the newline by peeking the last byte of the "referent" to
> see whether it is '\n'.

I'd still argue that we should do the same retroactive tightening as we
introduce for normal references, also with an INFO level at first.
Otherwise we're being inconsistent across the ref types.

> And we will remember the untrimmed length of the "referent" and call
> "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
> to chceck whether the trimmed referent format is valid. If not, we will
> report to the user that the symref points to referent which has invalid
> format. If it is valid, we will compare the untrimmed length and trimmed
> length, if they are not the same, we need to warn the user there is some
> trailing garbage in the symref content.
> 
> At last, we need to check whether the referent is the directory. We
> cannot distinguish whether the "refs/heads/a" is a directory or not by
> using "check_refname_format". We have already checked bad file type when
> iterating the "refs/" directory but we ignore the directory. Thus, we
> need to explicitly add check here.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |   4 ++
>  fsck.h                        |   1 +
>  refs/files-backend.c          |  81 +++++++++++++++++++++++
>  t/t0602-reffiles-fsck.sh      | 117 ++++++++++++++++++++++++++++++++++
>  4 files changed, 203 insertions(+)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 06d045ac48..beb6c4e49e 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,10 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badSymrefTarget`::
> +	(ERROR) The symref target points outside the ref directory or
> +	the name of the symref target is invalid.

These are two separate error cases, and we even have different code
paths raising them. Shouldn't we thus also have two different diagnostic
codes for this?

>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> diff --git a/fsck.h b/fsck.h
> index b85072df57..5ea874916d 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>  	FUNC(BAD_REF_CONTENT, ERROR) \
>  	FUNC(BAD_REF_FILETYPE, ERROR) \
>  	FUNC(BAD_REF_NAME, ERROR) \
> +	FUNC(BAD_SYMREF_TARGET, ERROR) \
>  	FUNC(BAD_TIMEZONE, ERROR) \
>  	FUNC(BAD_TREE, ERROR) \
>  	FUNC(BAD_TREE_SHA1, ERROR) \
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0187b85c5f..fef32e607f 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3434,11 +3434,80 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +/*
> + * Check the symref "referent" and "referent_path". For textual symref,
> + * "referent" would be the content after "refs:".
> + */
> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    struct strbuf *referent,
> +				    struct strbuf *referent_path)
> +{
> +	size_t len = referent->len - 1;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(referent->buf, "refs/", &p)) {
> +

There's a superfluous newline here.

Also, you never use the value of `p`, so you can instead use
`starts_with()`.

> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_TARGET,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	if (referent->buf[referent->len - 1] != '\n') {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");
> +		len++;
> +	}
> +
> +	strbuf_rtrim(referent);
> +	if (check_refname_format(referent->buf, 0)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_TARGET,
> +				      "points to refname with invalid format");
> +		goto out;
> +	}
> +
> +	if (len != referent->len) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_TRAILING_REF_CONTENT,
> +				      "trailing garbage in ref");
> +	}
> +
> +	/*
> +	 * Missing target should not be treated as any error worthy event and
> +	 * not even warn. It is a common case that a symbolic ref points to a
> +	 * ref that does not exist yet. If the target ref does not exist, just
> +	 * skip the check for the file type.
> +	 */
> +	if (lstat(referent_path->buf, &st))
> +		goto out;

We may also want to verify that `errno == ENOENT` here.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 4/4] ref: add symlink ref content check for files backend
  2024-09-03 12:21       ` [PATCH v3 4/4] ref: add symlink ref " shejialuo
@ 2024-09-09 15:04         ` Patrick Steinhardt
  2024-09-10  8:28           ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-09-09 15:04 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Sep 03, 2024 at 08:21:03PM +0800, shejialuo wrote:
> We have already introduced "files_fsck_symref_target". We should reuse
> this function to handle the symrefs which use legacy symbolic links. We
> should not check the trailing garbage for symbolic refs. Add a new
> parameter "symbolic_link" to disable some checks which should only be
> executed for textual symrefs.
> 
> We firstly use the "strbuf_add_real_path" to resolve the symlink and
> get the absolute path "referent_path" which the symlink ref points
> to. Then we can get the absolute path "abs_gitdir" of the "gitdir".
> By combining "referent_path" and "abs_gitdir", we can extract the
> "referent". Thus, we can reuse "files_fsck_symref_target" function to
> seamlessly check the symlink refs.
> 
> Because we are going to drop support for "core.prefersymlinkrefs", add a
> new fsck message "symlinkRef" to let the user be aware of this
> information.

I don't we fully decided to drop support for symrefs via symbolic links
yet, so this is a tad too strong of a statement. I'd rather say that we
consider deprecating it in the future, but first need to asses whether
they may still be used.

Also, didn't we say that we'd want to remove support for _writing_
symbolic links, but not for reading them? Not a 100% sure though.

> @@ -1961,13 +1965,12 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
>  
>  	if (ret)
>  		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
> -#endif
>  	return ret;
>  }
> +#endif
>  
> -static int create_symref_lock(struct files_ref_store *refs,
> -			      struct ref_lock *lock, const char *refname,
> -			      const char *target, struct strbuf *err)
> +static int create_symref_lock(struct ref_lock *lock, const char *target,
> +			      struct strbuf *err)
>  {
>  	if (!fdopen_lock_file(&lock->lk, "w")) {
>  		strbuf_addf(err, "unable to fdopen %s: %s",
> @@ -2583,8 +2586,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  	}
>  
>  	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
> -		if (create_symref_lock(refs, lock, update->refname,
> -				       update->new_target, err)) {
> +		if (create_symref_lock(lock, update->new_target, err)) {
>  			ret = TRANSACTION_GENERIC_ERROR;
>  			goto out;
>  		}

Why does the writing side need to change?

> @@ -3509,9 +3516,11 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  {
>  	struct strbuf referent_path = STRBUF_INIT;
>  	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf abs_gitdir = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
>  	struct strbuf refname = STRBUF_INIT;
>  	struct fsck_ref_report report = {0};
> +	unsigned int symbolic_link = 0;

This variable isn't used, as both code paths that end up using it could
just statically set it to `1` or `0`.

>  	const char *trailing = NULL;
>  	unsigned int type = 0;
>  	int failure_errno = 0;
> @@ -3521,8 +3530,37 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
>  	report.path = refname.buf;
>  
> -	if (S_ISLNK(iter->st.st_mode))
> +	if (S_ISLNK(iter->st.st_mode)) {
> +		const char* relative_referent_path;
> +
> +		symbolic_link = 1;
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_SYMLINK_REF,
> +				      "use deprecated symbolic link for symref");
> +
> +		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
> +		strbuf_normalize_path(&abs_gitdir);
> +		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
> +			strbuf_addch(&abs_gitdir, '/');
> +
> +		strbuf_add_real_path(&referent_path, iter->path.buf);
> +
> +		if (!skip_prefix(referent_path.buf,
> +				 abs_gitdir.buf,
> +				 &relative_referent_path)) {
> +			ret = fsck_report_ref(o, &report,
> +					      FSCK_MSG_BAD_SYMREF_TARGET,
> +					      "point to target outside gitdir");
> +			goto cleanup;
> +		}
> +
> +		strbuf_addstr(&referent, relative_referent_path);
> +		ret = files_fsck_symref_target(o, &report,
> +					       &referent, &referent_path,
> +					       symbolic_link);
> +
>  		goto cleanup;
> +	}
>  
>  	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
>  		ret = error_errno(_("%s/%s: unable to read the ref"),

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 2/4] ref: add regular ref content check for files backend
  2024-09-09 15:04         ` Patrick Steinhardt
@ 2024-09-10  7:42           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-10  7:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Sep 09, 2024 at 05:04:07PM +0200, Patrick Steinhardt wrote:
> > This is because if the caller set the "strict" field in "fsck_options",
> > fsck warns will be automatically upgraded to errors. We should not allow
> > user to specify the "--strict" flag to upgrade the fsck warnings to
> > errors at current.
> 
> This is formulated a bit curiously: it reads as if we wanted to limit
> what the user can do, but what we really want to ensure is that the
> `--strict` flag doesn't convert it into an error. So maybe something
> like this instead of the second sentence:
> 
>     We don't (yet) want the "--strict" flag that controls this bit to
>     end up generating errors for such weirdly-formatted reference
>     contents, as we first want to assess whether this retroactive
>     tightening will cause issues for any tools out there.
> 

Thanks, I will improve this in the next version.

> > the legacy repository. So we add the following two fsck infos to
> 
> I wouldn't call it "legacy" just yet, as we didn't yet decide whether
> we're going to make this formatting invalid in the first place. It's
> rather a test balloon.
> 

I agree, we should drop "legacy" here.

> > In current "git-fsck(1)", it will report an error when the ref content
> > is bad, so we should following this to report an error to the user when
> > "parse_loose_ref_contents" fails. And we add a new fsck error message
> > called "badRefContent(ERROR)" to represent that a ref has a bad content.
> 
> Okay, so this is basically porting over behaviour that git-fsck(1)
> already has to `git refs verify` and should thus not cause new issues
> anywhere. I think it would have made sense to do so in a first step and
> then introduce the tightened rules in a separate commit.
> 

By reading the whole comments, we'd better create a commit which ports
the existing checks to "git refs verify" both for regular refs and
symrefs.

So, I will add more commits in the next version with the following
sequences:

1. Set up the infrastructure to check the contents for refs.
2. Port existing checks in "git-fsck(1)" to "git refs verify".
3. Introduce the tightened rules.

> Will we eventually remove those checks from git-fsck(1) when we adapt it
> to call `git refs verify`? If so, we should likely note that in the
> commit message.

We should do this, as we have discussed before, "git-fsck(1)" implicitly
checks some refs which makes the code hard to understand.

> Coming back to my comment further up, I guess this whole block here
> could be introduced in a separate commit. So the first commit introduces
> the infra to check loose ref contents as an obvious step because we
> simply port over rules that already exist in git-fsck(1). And the second
> step could then do this retroactive tightening with the justification
> you have spelt out in the commit message.

Yes, it will be much more clear. So, I should not simply classify the
situations by the type of refs.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 3/4] ref: add symref content check for files backend
  2024-09-09 15:04         ` Patrick Steinhardt
@ 2024-09-10  8:02           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-10  8:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Sep 09, 2024 at 05:04:11PM +0200, Patrick Steinhardt wrote:
> > In order to provide above checks, we will first check whether the symref
> > content misses the newline by peeking the last byte of the "referent" to
> > see whether it is '\n'.
> 
> I'd still argue that we should do the same retroactive tightening as we
> introduce for normal references, also with an INFO level at first.
> Otherwise we're being inconsistent across the ref types.
> 

Actually, for above situations, we will use the same fsck error message
ids introduce in [PATCH v3 2/4]. And I think we must refer to this in
this commit message.

But it makes me wonder should we use a new commit to introduce these
two fsck message ids?

> > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > index 06d045ac48..beb6c4e49e 100644
> > --- a/Documentation/fsck-msgids.txt
> > +++ b/Documentation/fsck-msgids.txt
> > @@ -28,6 +28,10 @@
> >  `badRefName`::
> >  	(ERROR) A ref has an invalid format.
> >  
> > +`badSymrefTarget`::
> > +	(ERROR) The symref target points outside the ref directory or
> > +	the name of the symref target is invalid.
> 
> These are two separate error cases, and we even have different code
> paths raising them. Shouldn't we thus also have two different diagnostic
> codes for this?
> 

I agree. I will improve in the next version.

> > +	if (!skip_prefix(referent->buf, "refs/", &p)) {
> > +
> 
> There's a superfluous newline here.
> 
> Also, you never use the value of `p`, so you can instead use
> `starts_with()`.
> 

Thanks, actually I have searched the code with "is_prefix". Well, I
didn't think about "starts_<>".

> > +	/*
> > +	 * Missing target should not be treated as any error worthy event and
> > +	 * not even warn. It is a common case that a symbolic ref points to a
> > +	 * ref that does not exist yet. If the target ref does not exist, just
> > +	 * skip the check for the file type.
> > +	 */
> > +	if (lstat(referent_path->buf, &st))
> > +		goto out;
> 
> We may also want to verify that `errno == ENOENT` here.
> 

I agree, if "errno != ENOENT", we should report to the user about this
ref-unrelated failure.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 4/4] ref: add symlink ref content check for files backend
  2024-09-09 15:04         ` Patrick Steinhardt
@ 2024-09-10  8:28           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-10  8:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Sep 09, 2024 at 05:04:17PM +0200, Patrick Steinhardt wrote:
> > Because we are going to drop support for "core.prefersymlinkrefs", add a
> > new fsck message "symlinkRef" to let the user be aware of this
> > information.
> 
> I don't we fully decided to drop support for symrefs via symbolic links
> yet, so this is a tad too strong of a statement. I'd rather say that we
> consider deprecating it in the future, but first need to asses whether
> they may still be used.
> 

Yes, that will be much better.

> Also, didn't we say that we'd want to remove support for _writing_
> symbolic links, but not for reading them? Not a 100% sure though.
> 

I have re-read the Junio's patch about the breaking change. We will drop
the support for writing. But for reading we may or may not. I will
improve this in the next version.

> >  	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
> > -		if (create_symref_lock(refs, lock, update->refname,
> > -				       update->new_target, err)) {
> > +		if (create_symref_lock(lock, update->new_target, err)) {
> >  			ret = TRANSACTION_GENERIC_ERROR;
> >  			goto out;
> >  		}
> 
> Why does the writing side need to change?
> 

I squash two patches provided by Junio to sync with the "master" branch
to make sure the build could be passed. This is because Peff has
introduced the "UNUSED" check when building.

So we could just ignore this part.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 2/4] ref: add regular ref content check for files backend
  2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
  2024-09-09 15:04         ` Patrick Steinhardt
@ 2024-09-10 16:07         ` karthik nayak
  2024-09-13 10:25           ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: karthik nayak @ 2024-09-10 16:07 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4000 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, when parsing the regular refs for files backend by using

Nit: s/for/in the/

> "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> end with no newline or to contain some garbages.

The 'no newline' reads a bit odd, perhaps, "we allow the ref's content
to end with garbage or without a newline."


> Even though we never create such loose refs ourselves, we have accepted
> such loose refs. So, it is entirely possible that some third-party tools
> may rely on such loose refs being valid. We should not report an error
> fsck message at current. But let's notice such a "curiously formatted"

s/such a/such/ since the next line uses 'refs' in plural form.

> loose refs being valid and tell the user our findings, so we can access

s/access/assess

> the possible extent of damage when we tighten the parsing rules in the
> future.
>

We could also rewrite the last sentence to make it a little more clearer
as "We should notify the users about such 'curiously formatted' loose
refs so that adequate care is taken before we decide to tighter the rules
in the future."

> And it's not suitable to either report a warn fsck message to the user.
> This is because if the caller set the "strict" field in "fsck_options",
> fsck warns will be automatically upgraded to errors. We should not allow
> user to specify the "--strict" flag to upgrade the fsck warnings to
> errors at current. It might cause compatibility issue which may break
> the legacy repository. So we add the following two fsck infos to

I think Patrick touched base here and I agree with his comments.

> represent the situation where the ref content ends without newline or has
> garbages:
>
> 1. "refMissingNewline(INFO)": A ref does not end with newline. This kind
>    of ref may be considered ERROR in the future.
> 2. "trailingRefContent(INFO)": A ref has trailing contents. This kind of

s/contents/content

>    ref may be considered ERROR in the future.
>
> It may seem that we could not give the user any warnings by creating

s/could/would

> fsck infos. However, in "fsck.c::fsck_vreport", we will convert

I think we can also rephrase this first sentence a little better,
perhaps:

    It might appear that we can't provide the user with any warnings by
    using FSCK_INFO.

> "FSCK_INFO" to "FSCK_WARN" and we can still warn the user about these
> situations when using "git-refs verify" without introducing
> compatibility issue.

s/issue/issues

> In current "git-fsck(1)", it will report an error when the ref content
> is bad, so we should following this to report an error to the user when
> "parse_loose_ref_contents" fails. And we add a new fsck error message
> called "badRefContent(ERROR)" to represent that a ref has a bad content.

I would rephrase this a bit, as:

    The "git-fsck(1)" command reports an error when the ref content is
    invalid. Following this, add a similar check to "git refs verify". A
    a new fsck error message called "badRefContent(ERROR)" to represent
    that a ref has a invalid content.

[snip]

> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;

Since we iterate over all refs, we don't need to check the target for a
symbolic link. So we skip all symbolic links. Makes sense. Would be nice
to have a comment here.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 3/4] ref: add symref content check for files backend
  2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
  2024-09-09 15:04         ` Patrick Steinhardt
@ 2024-09-10 22:19         ` karthik nayak
  2024-09-12  4:00           ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: karthik nayak @ 2024-09-10 22:19 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]

shejialuo <shejialuo@gmail.com> writes:

[snip]

> And we will remember the untrimmed length of the "referent" and call
> "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
> to chceck whether the trimmed referent format is valid. If not, we will

s/chceck/check

> report to the user that the symref points to referent which has invalid
> format. If it is valid, we will compare the untrimmed length and trimmed
> length, if they are not the same, we need to warn the user there is some
> trailing garbage in the symref content.
>
> At last, we need to check whether the referent is the directory. We

s/is the/is a/

> cannot distinguish whether the "refs/heads/a" is a directory or not by

It would be a little clearer if we say

   We cannot distinguish whether a given reference like 'refs/heads/a'
   is a file or a directory.

> using "check_refname_format". We have already checked bad file type when
> iterating the "refs/" directory but we ignore the directory. Thus, we
> need to explicitly add check here.
>

[snip]

> +/*
> + * Check the symref "referent" and "referent_path". For textual symref,
> + * "referent" would be the content after "refs:".
> + */
> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    struct strbuf *referent,
> +				    struct strbuf *referent_path)
> +{
> +	size_t len = referent->len - 1;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(referent->buf, "refs/", &p)) {
> +
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_TARGET,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	if (referent->buf[referent->len - 1] != '\n') {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");
> +		len++;
> +	}
> +
> +	strbuf_rtrim(referent);
> +	if (check_refname_format(referent->buf, 0)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_TARGET,
> +				      "points to refname with invalid format");
> +		goto out;
> +	}
> +
> +	if (len != referent->len) {

Would this work with a symref containing:

    ref: refs/heads/feature\ngarbage\n

Since we check last character and rtrim, wouldn't this bypass our
checks? Isn't it better to find the first `\n` and check if the index <
referent->len?

> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_TRAILING_REF_CONTENT,
> +				      "trailing garbage in ref");
> +	}
> +
> +	/*
> +	 * Missing target should not be treated as any error worthy event and
> +	 * not even warn. It is a common case that a symbolic ref points to a
> +	 * ref that does not exist yet. If the target ref does not exist, just
> +	 * skip the check for the file type.
> +	 */

I think the common terminology for this is 'dangling symref'. Perhaps we
could shorten this to simply say:

    Dangling symrefs are common and so we don't report them.

> +	if (lstat(referent_path->buf, &st))
> +		goto out;
> +
> +	/*
> +	 * We cannot distinguish whether "refs/heads/a" is directory or nots by

s/is/is a/
s/nots/not/

> +	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
> +	 * check the file type of the target.
> +	 */
> +	if (S_ISDIR(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_TARGET,
> +				      "points to the directory");
> +		goto out;
> +	}
> +
> +out:
> +	return ret;
> +}
> +

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 3/4] ref: add symref content check for files backend
  2024-09-10 22:19         ` karthik nayak
@ 2024-09-12  4:00           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-12  4:00 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Sep 10, 2024 at 03:19:49PM -0700, karthik nayak wrote:

[snip]

> > +	if (referent->buf[referent->len - 1] != '\n') {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_REF_MISSING_NEWLINE,
> > +				      "missing newline");
> > +		len++;
> > +	}
> > +
> > +	strbuf_rtrim(referent);
> > +	if (check_refname_format(referent->buf, 0)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_TARGET,
> > +				      "points to refname with invalid format");
> > +		goto out;
> > +	}
> > +
> > +	if (len != referent->len) {
> 
> Would this work with a symref containing:
> 
>     ref: refs/heads/feature\ngarbage\n
> 
> Since we check last character and rtrim, wouldn't this bypass our
> checks? Isn't it better to find the first `\n` and check if the index <
> referent->len?
> 

We will check the above example by "check_refname_format". It will
report the following message:

  error: ... : badSymrefTarget: points to refname with invalid format

From the context, I guess you suggest that we should report there is a
trailing garbage in the ref. However, for the above situation, we should
report an error which is align with the behavior of the "git-fsck(1)".

So there is no need to check whether there is a trailing garbage when we
encounter an error.

And we cannot use this way, for example:

  ref: refs/heads/feature   \n

If we find the first '\n' index. In this example, index will be equal to
"referent->len". And we totally ignore this case.

> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_TRAILING_REF_CONTENT,
> > +				      "trailing garbage in ref");
> > +	}
> > +
> > +	/*
> > +	 * Missing target should not be treated as any error worthy event and
> > +	 * not even warn. It is a common case that a symbolic ref points to a
> > +	 * ref that does not exist yet. If the target ref does not exist, just
> > +	 * skip the check for the file type.
> > +	 */
> 
> I think the common terminology for this is 'dangling symref'. Perhaps we
> could shorten this to simply say:
> 
>     Dangling symrefs are common and so we don't report them.
> 

Thanks, I will improve this in the next version.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v3 2/4] ref: add regular ref content check for files backend
  2024-09-10 16:07         ` karthik nayak
@ 2024-09-13 10:25           ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-13 10:25 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Sep 10, 2024 at 09:07:15AM -0700, karthik nayak wrote:

[snip]

> > +static int files_fsck_refs_content(struct ref_store *ref_store,
> > +				   struct fsck_options *o,
> > +				   const char *refs_check_dir,
> > +				   struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> > +	struct strbuf referent = STRBUF_INIT;
> > +	struct strbuf refname = STRBUF_INIT;
> > +	struct fsck_ref_report report = {0};
> > +	const char *trailing = NULL;
> > +	unsigned int type = 0;
> > +	int failure_errno = 0;
> > +	struct object_id oid;
> > +	int ret = 0;
> > +
> > +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> > +	report.path = refname.buf;
> > +
> > +	if (S_ISLNK(iter->st.st_mode))
> > +		goto cleanup;
> 
> Since we iterate over all refs, we don't need to check the target for a
> symbolic link. So we skip all symbolic links. Makes sense. Would be nice
> to have a comment here.
> 

Today I am handling the reviews, there is a misunderstanding here. It's
correct that "We don't need to check the target for a symbolic link".
But we do need to check the symbolic links. It might be a symlink
symref. In here, we just ignore the implementation and will be
implemented in the later patch.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v4 0/5] add ref content check for files backend
  2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
                         ` (3 preceding siblings ...)
  2024-09-03 12:21       ` [PATCH v3 4/4] ref: add symlink ref " shejialuo
@ 2024-09-13 17:14       ` shejialuo
  2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
                           ` (6 more replies)
  4 siblings, 7 replies; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This version handles some minor problems mainly focus at the improving
commit messages, comments and some minor problems.

1. Split [PATCH v3 2/4] into two commits [PATCH v4 2/5] and [PATCH v4
3/5]. [PATCH v4 2/5] integrates "git-fsck(1)"'s check and [PATCH v4 3/5]
tightens rules to check the refs with trailing garbage and refs without
newline.

2. Handle a lot of typo errors in original [PATCH v3 2/4]. And improve
the fsck-msgids documentation.

3. Improve [PATCH v4 4/5]'s commit message to first introduce the
tighten rules to be consistent with the [PATCH v4 3/5].

4. Remove "badSymrefTarget(ERROR)" fsck message. Add three new messages
to be more specific:

  1. badReferentFiletype(ERROR): The referent of a symref has a bad file
  type.

  2. badReferentName(ERROR): The referent name of a symref is invalid.

  3. escapeReferent(ERROR): The referent of a symref is outside the
  ref directory

5. Handle typos and some minor problems.

Because I add more commits, I provide the "--interdiff" here to make the
reviewer's life easier.

However, because I have not merged the latest ci fixup, so I cannot
verify some jobs in CIs. May need the help from Junio to verify.

Thanks,
Jialuo

shejialuo (5):
  ref: initialize "fsck_ref_report" with zero
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add symref content check for files backend
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  25 +++
 fsck.h                        |   7 +
 refs.c                        |   2 +-
 refs/files-backend.c          | 202 +++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 334 ++++++++++++++++++++++++++++++++++
 6 files changed, 560 insertions(+), 12 deletions(-)

Interdiff against v3:
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 9e8e1ac7f0..31626e765b 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -20,7 +20,7 @@
 	(ERROR) A commit object has a bad parent sha1.
 
 `badRefContent`::
-	(ERROR) A ref has a bad content.
+	(ERROR) A ref has bad content.
 
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
@@ -28,9 +28,11 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
-`badSymrefTarget`::
-	(ERROR) The symref target points outside the ref directory or
-	the name of the symref target is invalid.
+`badReferentFiletype`::
+	(ERROR) The referent of a symref has a bad file type.
+
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
 
 `badTagName`::
 	(INFO) A tag has an invalid format.
@@ -53,6 +55,9 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(ERROR) The referent of a symref is outside the "ref" directory.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
@@ -178,8 +183,8 @@
 	(WARN) Tree contains entries pointing to a null sha1.
 
 `refMissingNewline`::
-	(INFO) A ref does not end with newline. This kind of ref may
-	be considered ERROR in the future.
+	(INFO) A ref does not end with newline. This will be
+	considered an error in the future.
 
 `symlinkRef`::
 	(INFO) A symref uses the symbolic link. This kind of symref may
@@ -187,8 +192,8 @@
 	symlink support.
 
 `trailingRefContent`::
-	(INFO) A ref has trailing contents. This kind of ref may be
-	considered ERROR in the future.
+	(INFO) A ref has trailing content. This will be
+	considered an error in the future.
 
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
diff --git a/fsck.h b/fsck.h
index 1c6f750812..b72ee632a4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,12 +34,14 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
-	FUNC(BAD_SYMREF_TARGET, ERROR) \
+	FUNC(BAD_REFERENT_FILETYPE, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
 	FUNC(BAD_TYPE, ERROR) \
 	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(ESCAPE_REFERENT, ERROR) \
 	FUNC(MISSING_AUTHOR, ERROR) \
 	FUNC(MISSING_COMMITTER, ERROR) \
 	FUNC(MISSING_EMAIL, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2a1b952f0d..c511deb509 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3449,14 +3449,12 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    unsigned int symbolic_link)
 {
 	size_t len = referent->len - 1;
-	const char *p = NULL;
 	struct stat st;
 	int ret = 0;
 
-	if (!skip_prefix(referent->buf, "refs/", &p)) {
-
+	if (!starts_with(referent->buf, "refs/")) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_ESCAPE_REFERENT,
 				      "points to ref outside the refs directory");
 		goto out;
 	}
@@ -3473,7 +3471,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to refname with invalid format");
 		goto out;
 	}
@@ -3485,22 +3483,24 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	}
 
 	/*
-	 * Missing target should not be treated as any error worthy event and
-	 * not even warn. It is a common case that a symbolic ref points to a
-	 * ref that does not exist yet. If the target ref does not exist, just
-	 * skip the check for the file type.
+	 * Dangling symrefs are common and so we don't report them.
 	 */
-	if (lstat(referent_path->buf, &st))
+	if (lstat(referent_path->buf, &st)) {
+		if (errno != ENOENT) {
+			ret = error_errno(_("unable to stat '%s'"),
+					  referent_path->buf);
+		}
 		goto out;
+	}
 
 	/*
-	 * We cannot distinguish whether "refs/heads/a" is directory or nots by
+	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
 	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
 	 * check the file type of the target.
 	 */
 	if (S_ISDIR(st.st_mode)) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_BAD_REFERENT_FILETYPE,
 				      "points to the directory");
 		goto out;
 	}
@@ -3520,7 +3520,6 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
-	unsigned int symbolic_link = 0;
 	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
@@ -3533,7 +3532,6 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	if (S_ISLNK(iter->st.st_mode)) {
 		const char* relative_referent_path;
 
-		symbolic_link = 1;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_SYMLINK_REF,
 				      "use deprecated symbolic link for symref");
@@ -3549,21 +3547,20 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				 abs_gitdir.buf,
 				 &relative_referent_path)) {
 			ret = fsck_report_ref(o, &report,
-					      FSCK_MSG_BAD_SYMREF_TARGET,
+					      FSCK_MSG_ESCAPE_REFERENT,
 					      "point to target outside gitdir");
 			goto cleanup;
 		}
 
 		strbuf_addstr(&referent, relative_referent_path);
 		ret = files_fsck_symref_target(o, &report,
-					       &referent, &referent_path,
-					       symbolic_link);
+					       &referent, &referent_path, 1);
 
 		goto cleanup;
 	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
-		ret = error_errno(_("%s/%s: unable to read the ref"),
+		ret = error_errno(_("unable to read ref '%s/%s'"),
 				  refs_check_dir, iter->relative_path);
 		goto cleanup;
 	}
@@ -3578,14 +3575,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	}
 
 	if (!(type & REF_ISSYMREF)) {
-		if (*trailing == '\0') {
+		if (!*trailing) {
 			ret = fsck_report_ref(o, &report,
 					      FSCK_MSG_REF_MISSING_NEWLINE,
 					      "missing newline");
 			goto cleanup;
 		}
 
-		if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+		if (*trailing != '\n' || *(trailing + 1)) {
 			ret = fsck_report_ref(o, &report,
 					      FSCK_MSG_TRAILING_REF_CONTENT,
 					      "trailing garbage in ref");
@@ -3602,7 +3599,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		ret = files_fsck_symref_target(o, &report,
 					       &referent,
 					       &referent_path,
-					       symbolic_link);
+					       0);
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index e735816d5b..7c3579705f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -268,7 +268,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
 	EOF
 	rm $branch_dir_prefix/branch-bad-1 &&
 	test_cmp expect err &&
@@ -276,7 +276,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
 	EOF
 	rm $branch_dir_prefix/branch-bad-2 &&
 	test_cmp expect err &&
@@ -284,7 +284,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
 	EOF
 	rm $branch_dir_prefix/branch-bad-3 &&
 	test_cmp expect err
@@ -311,9 +311,9 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
-	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
-	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
 	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
 	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
 	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
@@ -347,7 +347,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-1 &&
 	test_cmp expect err &&
@@ -356,7 +356,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-2 &&
 	test_cmp expect err &&
@@ -365,7 +365,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-3 &&
 	test_cmp expect err &&
@@ -374,7 +374,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
-	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
 	EOF
 	rm $tag_dir_prefix/tag-symbolic-1 &&
 	test_cmp expect err &&
@@ -383,7 +383,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
-	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
 	EOF
 	rm $tag_dir_prefix/tag-symbolic-2 &&
 	test_cmp expect err
@@ -407,11 +407,11 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (aggregat
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
-	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
-	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
-	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
-	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
 	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
 	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
 	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
@ 2024-09-13 17:17         ` shejialuo
  2024-09-18 16:41           ` Junio C Hamano
  2024-09-13 17:17         ` [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend shejialuo
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use " {0} " to express
that we are 0-initializing everything. In order to be align with the the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8d6ec9458d..890d0324e1 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
  2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-09-13 17:17         ` shejialuo
  2024-09-18 18:59           ` Junio C Hamano
  2024-09-13 17:17         ` [PATCH v4 3/5] ref: add more strict checks for regular refs shejialuo
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, we have already set up the infrastructure of the ref
consistency checks. We need to port original checks from "git-fsck(1)".
Thus, we could clean the "git-fsck(1)" code by removing these implicit
checks.

The "git-fsck(1)" command reports an error when the ref content is
invalid. Following this, add a similar check to "git refs verify".
Add a new fsck error message called "badRefContent(ERROR)" to represent
that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 43 +++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 60 +++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 890d0324e1..b1ed2e5c04 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = error_errno(_("unable to read ref '%s/%s'"),
+				  refs_check_dir, iter->relative_path);
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "invalid ref content");
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3554,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..a1205b3a3b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,64 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v4 3/5] ref: add more strict checks for regular refs
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
  2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-09-13 17:17         ` [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-09-13 17:17         ` shejialuo
  2024-09-18 19:39           ` Junio C Hamano
  2024-09-13 17:18         ` [PATCH v4 4/5] ref: add symref content check for files backend shejialuo
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A ref does not end with newline. This will
   be considered an error in the future.
2. trailingRefContent(INFO): A ref has trailing content. This will be
   considered an error in the future.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 +++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 27 ++++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 60 +++++++++++++++++++++++++++++++++++
 6 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..8827137ef0 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,14 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A ref does not end with newline. This will be
+	considered an error in the future.
+
+`trailingRefContent`::
+	(INFO) A ref has trailing content. This will be
+	considered an error in the future.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b1ed2e5c04..df4ce270ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3439,6 +3443,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3458,13 +3463,29 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
 				      "invalid ref content");
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "missing newline");
+			goto cleanup;
+		}
+
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "trailing garbage in ref");
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a1205b3a3b..a06ad044f2 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -101,6 +101,54 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	git refs verify 2>err &&
 	test_must_be_empty err &&
 
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-1 &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-2 &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-3 &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-4 &&
+	test_cmp expect err &&
+
 	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
@@ -135,6 +183,12 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_commit default &&
 	mkdir -p "$branch_dir_prefix/a/b" &&
 
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
 	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
 	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
 	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
@@ -144,6 +198,12 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
 	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
 	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v4 4/5] ref: add symref content check for files backend
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
                           ` (2 preceding siblings ...)
  2024-09-13 17:17         ` [PATCH v4 3/5] ref: add more strict checks for regular refs shejialuo
@ 2024-09-13 17:18         ` shejialuo
  2024-09-18 20:19           ` Junio C Hamano
  2024-09-13 17:18         ` [PATCH v4 5/5] ref: add symlink ref " shejialuo
                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:18 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symref points to.
Instead, we just need to check the content of the symref itself.

A regular file is accepted as a textual symref if it begins with
"ref:", followed by zero or more whitespaces, followed by the full
refname, followed only by whitespace characters. We always write
a single SP after "ref:" and a single LF after the refname, but
third-party reimplementations of Git may have taken advantage of the
looser syntax. Put it more specific, we accept the following contents
of the symref:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

Thus, we could reuse "refMissingNewline" and "trailingRefContent"
FSCK_INFOs to do the same retroactive tightening as we introduce for
regular references.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors to the user.

In order to check the content of the symref, create a function
"files_fsck_symref_target". It will first check whether the "referent"
is under the "refs/" directory, if not, we will report "escapeReferent"
fsck error message to notify the user this situation.

Then, we will first check whether the symref content misses the newline
by peeking the last byte of the "referent" to see whether it is '\n'.

And we will remember the untrimmed length of the "referent" and call
"strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
to check whether the trimmed referent format is valid. If not, we will
report to the user that the symref points to referent which has invalid
format. If it is valid, we will compare the untrimmed length and trimmed
length, if they are not the same, we need to warn the user there is some
trailing garbage in the symref content.

At last, we need to check whether the referent is a directory. We cannot
distinguish whether a given reference like "refs/heads/a" is a file or a
directory by using "check_refname_format". We have already checked bad
file type when iterating the "refs/" directory but we ignore the
directory. Thus, we need to explicitly add check here.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   9 +++
 fsck.h                        |   3 +
 refs/files-backend.c          |  81 +++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 117 ++++++++++++++++++++++++++++++++++
 4 files changed, 210 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 8827137ef0..03bcb77972 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,12 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentFiletype`::
+	(ERROR) The referent of a symref has a bad file type.
+
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
@@ -49,6 +55,9 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(ERROR) The referent of a symref is outside the "ref" directory.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..c90561c6db 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,11 +34,14 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_FILETYPE, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
 	FUNC(BAD_TYPE, ERROR) \
 	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(ESCAPE_REFERENT, ERROR) \
 	FUNC(MISSING_AUTHOR, ERROR) \
 	FUNC(MISSING_COMMITTER, ERROR) \
 	FUNC(MISSING_EMAIL, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index df4ce270ae..0cb4a2da73 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,11 +3434,80 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "referent" and "referent_path". For textual symref,
+ * "referent" would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent,
+				    struct strbuf *referent_path)
+{
+	size_t len = referent->len - 1;
+	struct stat st;
+	int ret = 0;
+
+	if (!starts_with(referent->buf, "refs/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_ESCAPE_REFERENT,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	if (referent->buf[referent->len - 1] != '\n') {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+		len++;
+	}
+
+	strbuf_rtrim(referent);
+	if (check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to refname with invalid format");
+		goto out;
+	}
+
+	if (len != referent->len) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "trailing garbage in ref");
+	}
+
+	/*
+	 * Dangling symrefs are common and so we don't report them.
+	 */
+	if (lstat(referent_path->buf, &st)) {
+		if (errno != ENOENT) {
+			ret = error_errno(_("unable to stat '%s'"),
+					  referent_path->buf);
+		}
+		goto out;
+	}
+
+	/*
+	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
+	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
+	 * check the file type of the target.
+	 */
+	if (S_ISDIR(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_FILETYPE,
+				      "points to the directory");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
+	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "trailing garbage in ref");
 			goto cleanup;
 		}
+	} else {
+		strbuf_addf(&referent_path, "%s/%s",
+			    ref_store->gitdir, referent.buf);
+		/*
+		 * the referent may contain the spaces and the newline, need to
+		 * trim for path.
+		 */
+		strbuf_rtrim(&referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent,
+					       &referent_path);
 	}
 
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&referent_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a06ad044f2..9580c340ab 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -209,4 +209,121 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err &&
+
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-3 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v4 5/5] ref: add symlink ref content check for files backend
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
                           ` (3 preceding siblings ...)
  2024-09-13 17:18         ` [PATCH v4 4/5] ref: add symref content check for files backend shejialuo
@ 2024-09-13 17:18         ` shejialuo
  2024-09-18 23:02           ` Junio C Hamano
  2024-09-18 16:49         ` [PATCH v4 0/5] add " Junio C Hamano
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
  6 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-13 17:18 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

We firstly use the "strbuf_add_real_path" to resolve the symlink and
get the absolute path "referent_path" which the symlink ref points
to. Then we can get the absolute path "abs_gitdir" of the "gitdir".
By combining "referent_path" and "abs_gitdir", we can extract the
"referent". Thus, we can reuse "files_fsck_symref_target" function to
seamlessly check the symlink refs.

Because we consider deprecating writing the symbolic links and for
reading, we may or may not deprecate. We first need to asses whether
symbolic links may still be used. So, add a new fsck message
"symlinkRef(INFO)" to let the user be aware of this information.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  5 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 65 ++++++++++++++++++-----
 t/t0602-reffiles-fsck.sh      | 97 +++++++++++++++++++++++++++++++++++
 4 files changed, 154 insertions(+), 14 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 03bcb77972..31626e765b 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -186,6 +186,11 @@
 	(INFO) A ref does not end with newline. This will be
 	considered an error in the future.
 
+`symlinkRef`::
+	(INFO) A symref uses the symbolic link. This kind of symref may
+	be considered ERROR in the future when totally dropping the
+	symlink support.
+
 `trailingRefContent`::
 	(INFO) A ref has trailing content. This will be
 	considered an error in the future.
diff --git a/fsck.h b/fsck.h
index c90561c6db..b72ee632a4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -89,6 +89,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0cb4a2da73..c511deb509 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../copy.h"
 #include "../environment.h"
 #include "../gettext.h"
@@ -1950,10 +1951,13 @@ static int commit_ref_update(struct files_ref_store *refs,
 	return 0;
 }
 
+#ifdef NO_SYMLINK_HEAD
+#define create_ref_symlink(a, b) (-1)
+#else
 static int create_ref_symlink(struct ref_lock *lock, const char *target)
 {
 	int ret = -1;
-#ifndef NO_SYMLINK_HEAD
+
 	char *ref_path = get_locked_file_path(&lock->lk);
 	unlink(ref_path);
 	ret = symlink(target, ref_path);
@@ -1961,13 +1965,12 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
 
 	if (ret)
 		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
-#endif
 	return ret;
 }
+#endif
 
-static int create_symref_lock(struct files_ref_store *refs,
-			      struct ref_lock *lock, const char *refname,
-			      const char *target, struct strbuf *err)
+static int create_symref_lock(struct ref_lock *lock, const char *target,
+			      struct strbuf *err)
 {
 	if (!fdopen_lock_file(&lock->lk, "w")) {
 		strbuf_addf(err, "unable to fdopen %s: %s",
@@ -2583,8 +2586,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	}
 
 	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
-		if (create_symref_lock(refs, lock, update->refname,
-				       update->new_target, err)) {
+		if (create_symref_lock(lock, update->new_target, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
 			goto out;
 		}
@@ -3436,12 +3438,15 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 /*
  * Check the symref "referent" and "referent_path". For textual symref,
- * "referent" would be the content after "refs:".
+ * "referent" would be the content after "refs:". For symlink ref,
+ * "referent" would be the relative path agaignst "gitdir" which should
+ * be the same as the textual symref literally.
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent,
-				    struct strbuf *referent_path)
+				    struct strbuf *referent_path,
+				    unsigned int symbolic_link)
 {
 	size_t len = referent->len - 1;
 	struct stat st;
@@ -3454,14 +3459,16 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (referent->buf[referent->len - 1] != '\n') {
+	if (!symbolic_link && referent->buf[referent->len - 1] != '\n') {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_REF_MISSING_NEWLINE,
 				      "missing newline");
 		len++;
 	}
 
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
+
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
@@ -3469,7 +3476,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (len != referent->len) {
+	if (!symbolic_link && len != referent->len) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_TRAILING_REF_CONTENT,
 				      "trailing garbage in ref");
@@ -3509,6 +3516,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 {
 	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
@@ -3521,8 +3529,35 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
 	report.path = refname.buf;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&referent_path, iter->path.buf);
+
+		if (!skip_prefix(referent_path.buf,
+				 abs_gitdir.buf,
+				 &relative_referent_path)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_ESCAPE_REFERENT,
+					      "point to target outside gitdir");
+			goto cleanup;
+		}
+
+		strbuf_addstr(&referent, relative_referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent, &referent_path, 1);
+
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = error_errno(_("unable to read ref '%s/%s'"),
@@ -3563,7 +3598,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		strbuf_rtrim(&referent_path);
 		ret = files_fsck_symref_target(o, &report,
 					       &referent,
-					       &referent_path);
+					       &referent_path,
+					       0);
 	}
 
 cleanup:
@@ -3571,6 +3607,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&referent_path);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 9580c340ab..7c3579705f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -326,4 +326,101 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-2 &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-3 &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-2 &&
+	test_cmp expect err
+'
+
+test_expect_success SYMLINKS 'symlink symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero
  2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-09-18 16:41           ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 16:41 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
> "referent" is NULL. So, we need to always initialize these parameters to
> NULL instead of letting them point to anywhere when creating a new
> "fsck_ref_report" structure.
>
> The original code explicitly initializes the "path" member in the
> "struct fsck_ref_report" to NULL (which implicitly 0-initializes other
> members in the struct). It is more customary to use " {0} " to express

" {0} " -> "{ 0 }" 

> that we are 0-initializing everything. In order to be align with the the

"be align with the the" -> "align with the"

> codebase, initialize "fsck_ref_report" with zero.

Both I'll amend in-place so no need to reroll just for these.

Thanks.

>
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8d6ec9458d..890d0324e1 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
>  		goto cleanup;
>  
>  	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> -		struct fsck_ref_report report = { .path = NULL };
> +		struct fsck_ref_report report = { 0 };
>  
>  		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
>  		report.path = sb.buf;

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 0/5] add ref content check for files backend
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
                           ` (4 preceding siblings ...)
  2024-09-13 17:18         ` [PATCH v4 5/5] ref: add symlink ref " shejialuo
@ 2024-09-18 16:49         ` Junio C Hamano
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
  6 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 16:49 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> Because I add more commits, I provide the "--interdiff" here to make the
> reviewer's life easier.

Yeah, for the changes from the previous iteration of this series,
range-diff comparison is pretty much useless.  Interdiff is indeed
more usable, but essentially this iteration deserves reviews with
fresh sets of eyes.

> However, because I have not merged the latest ci fixup, so I cannot
> verify some jobs in CIs. May need the help from Junio to verify.

A good way to do so is to fork a temporary branch at the tip of
these 5 commits, and then either merge or cherry-pick the CI fixup.
Such a temporary branch should be usable for CI testing, right?

Thanks.

PS.

I am not feeling well today; please expect delayed and/or sparse
responses.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend
  2024-09-13 17:17         ` [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-09-18 18:59           ` Junio C Hamano
  2024-09-22 14:58             ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 18:59 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, we have already set up the infrastructure of the ref
> consistency checks. We need to port original checks from "git-fsck(1)".
> Thus, we could clean the "git-fsck(1)" code by removing these implicit
> checks.

The above reads as if you are, in preparation to "port" the checks
we have in "fsck" to elsewhere (presumably to "refs verify"), you
are removing the checks that _will_ become redundant from "fsck".

But that does not seem to be what is happening.  Let me try to
paraphrase, in order to check my understanding of what you wanted to
say:

    "git-fsck(1) has some consistency checks for regular refs.  As
    we want to align the checks "git refs verify" performs with
    them (and eventually call the unified code that checks refs from
    both), port the logic "git fsck" has to "git refs verify".

If we fail to achieve the "a single unified code to check called by
both fsck and refs-verify" at the end of this series, and instead
end up with duplicated code that implements the checks in two
separate code, risking them to be slightly different and drift away
over time from each other, that is fine, as long as our intention is
to continue the effort for unification in a follow up series.  

But such a plan needs to be spelled out.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 890d0324e1..b1ed2e5c04 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;

"symbolic links are OK" for now.  We'll add sanity checks for them
in later steps.  OK.

> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = error_errno(_("unable to read ref '%s/%s'"),
> +				  refs_check_dir, iter->relative_path);

Is there a reason why we cannot to use report.path aka refname.buf,
and instead we have to recompute the same path again?

Should this error be propagated back to the caller, not just to the
end-user, by a call to fsck_report_ref(), like you do for a ref file
that has questionable contents?  If ref iteration (like for-each-ref)
claims there is this ref, and you cannot read its value when you try
to use it, it is just as bad as having a loose ref file that has
unusable contents, isn't it?

It is a separate matter if such a failure mode deserves its own
error code (FSCK_MSG_UNREADABLE_REF) or can be rolled into the same
FSCK_MSG_BAD_REF_CONTENT.  I can see arguments for both sides and
offhand have no strong preference either way.

Thanks.

> +		goto cleanup;
> +	}
> +
> +	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
> +				     ref_content.buf, &oid, &referent,
> +				     &type, &failure_errno)) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "invalid ref content");
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	strbuf_release(&refname);
> +	strbuf_release(&ref_content);
> +	strbuf_release(&referent);
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 3/5] ref: add more strict checks for regular refs
  2024-09-13 17:17         ` [PATCH v4 3/5] ref: add more strict checks for regular refs shejialuo
@ 2024-09-18 19:39           ` Junio C Hamano
  2024-09-22 15:06             ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 19:39 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> +`refMissingNewline`::
> +	(INFO) A ref does not end with newline. This will be
> +	considered an error in the future.

It is ONLY files backend's loose-ref representation to store the
object name that is the value of the ref as hexadecimal text
terminated with a newline.  With packed backend, even if the file
ends with an incomplete line, it would be confusing to say that such
lack of terminating LF is associated with a particular ref.  With
reftable backend, the object name may not even be hexadecimal but
binary without any terminating LF.

At least you should say "A loose ref file that does not end with...",
because a ref NEVER ends or contains newline, and what you are
expecting to be terminated with LF is not even a ref, but the value
of it.

Also, isn't it too strong to say "will be" without giving any
further information, like:

    As valid implementations of Git never created such a loose ref
    file, it may become an error in the future.  Report to the
    git@vger.kernel.org mailing list if you see this error, as we
    need to know what tools created such a file.

or something?

The same comment applies to the next entry.

> @@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
>  		*failure_errno = EINVAL;
>  		return -1;
>  	}
> +
> +	if (trailing)
> +		*trailing = p;
> +
>  	return 0;

In the pre-context of this hunk, if parse_oid_hex_algoph() failed to
recognise the initial segment of buf as a valid hexadecimal object
name, it would have already returned, so we know 'p' is always valid
here.  It is the byte that comes immediately after the hexadecimal
object name.

OK.

>  	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
>  				     ref_content.buf, &oid, &referent,
> -				     &type, &failure_errno)) {
> +				     &type, &trailing, &failure_errno)) {
>  		ret = fsck_report_ref(o, &report,
>  				      FSCK_MSG_BAD_REF_CONTENT,
>  				      "invalid ref content");
>  		goto cleanup;
>  	}
>  
> +	if (!(type & REF_ISSYMREF)) {

Just like we punted for S_ISLNK() in an earlier step, we for now
deal with regular refs in this step.  OK.

> +		if (!*trailing) {
> +			ret = fsck_report_ref(o, &report,
> +					      FSCK_MSG_REF_MISSING_NEWLINE,
> +					      "missing newline");
> +			goto cleanup;
> +		}
> +
> +		if (*trailing != '\n' || *(trailing + 1)) {
> +			ret = fsck_report_ref(o, &report,
> +					      FSCK_MSG_TRAILING_REF_CONTENT,
> +					      "trailing garbage in ref");
> +			goto cleanup;
> +		}

Not limited to this patch, but isn't fsck_report_ref() misdesigned,
or is it just they are used poorly in these patches?  In these two
callsites, the message string parameter does not give any more
information than what the FSCK_MSG_* enum gives.

In fact, MSG_REF_MISSING_NEWLINE at least says that the complaint is
about refs, but "missing newline" does not even say from what the
newline is missing.  For TRAILING_REF_CONTENT, people may expect to
see what garbage follows the expected contents, but that information
(i.e. contents of *trailing) is lost here.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 4/5] ref: add symref content check for files backend
  2024-09-13 17:18         ` [PATCH v4 4/5] ref: add symref content check for files backend shejialuo
@ 2024-09-18 20:19           ` Junio C Hamano
  2024-09-22 15:53             ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 20:19 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

Expect that people do not read the body of the message as completing
a paragrpah the title started.  I.e. ...

> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symref points to.
> Instead, we just need to check the content of the symref itself.

... this needs a bit of preamble, like

    We have code that check regular ref contents, but we do not yet
    check contents of symbolic refs.

> A regular file is accepted as a textual symref if it begins with
> "ref:", followed by zero or more whitespaces, followed by the full
> refname, followed only by whitespace characters. We always write
> a single SP after "ref:" and a single LF after the refname, but
> third-party reimplementations of Git may have taken advantage of the
> looser syntax. Put it more specific, we accept the following contents
> of the symref:
>
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
>
> Thus, we could reuse "refMissingNewline" and "trailingRefContent"
> FSCK_INFOs to do the same retroactive tightening as we introduce for
> regular references.
>
> But we do not allow any other trailing garbage. The followings are bad
> symref contents which will be reported as fsck error by "git-fsck(1)".

This description needs to be updated, as it is unclear if you are
talking about errors we already detect, or if you are planning to
update fsck to notice and report these errors.

> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
>
> And we introduce a new "badReferentName(ERROR)" fsck message to report
> above errors to the user.

OK.

> In order to check the content of the symref, create a function
> "files_fsck_symref_target". It will first check whether the "referent"
> is under the "refs/" directory, if not, we will report "escapeReferent"
> fsck error message to notify the user this situation.
>
> Then, we will first check whether the symref content misses the newline
> by peeking the last byte of the "referent" to see whether it is '\n'.

"Then, we will first" -> "Then it checks" or something.

You already consumed "first" for the check to limit the referent to
those under "refs/" hierarchy.

> And we will remember the untrimmed length of the "referent" and call
> "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
> to check whether the trimmed referent format is valid. If not, we will
> report to the user that the symref points to referent which has invalid
> format. If it is valid, we will compare the untrimmed length and trimmed
> length, if they are not the same, we need to warn the user there is some
> trailing garbage in the symref content.

That is an implementation detail of what you did.  But if the
implementation were buggy and did not exactly what you intended to
do, the above description gives no information to help others to fix
it up so that it works as you intended it to work, because you do
not explain it.

So what did you want to achieve in the third step (the first being
"limit to refs/ hiararchy", the second being "no incomplete lines
allowed")?

    Third, we want to make sure that the contents of a textual
    symref MUST have a single LF after the target refname and
    NOTHING ELSE.

or something.

> At last, we need to check whether the referent is a directory. We cannot

"a directory" -> "an existing directory"?

I am not comfortable to see the word "directory" used in this
proposed log message, as some refs could be stored in the packed
backend and are referenced by the symbolic ref you are inspecting
(this comment also refers to the "refs/ directory" you mentioned
earlier as "the first check").

    Lastly, a symbolic ref MUST either point to an existing ref,
    or if the referent does not exist, it MUST NOT be a leading
    subpath for another existing ref (e.g., when "refs/heads/main"
    exists, a symbolic ref that points at "refs/heads" is a no-no).

or something (but again, I am open to a phrasing better than
"subpath").

Design question.  What do we want to do when we have no loose refs
under the "refs/heads/historical/" hiearchy, (i.e. all of them are
in packed-refs file) hence ".git/refs/heads/historical" directory
does not exist on the filesystem.  And a symbolic ref points at
"refs/heads/historical".  Shouldn't we give the same error whether
the .git/refs/heads/historical directory exist or not, as long as
the refs/heads/historical/main branch exists (in the packed-refs
backend)?

> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 8827137ef0..03bcb77972 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,12 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badReferentFiletype`::
> +	(ERROR) The referent of a symref has a bad file type.
> +
> +`badReferentName`::
> +	(ERROR) The referent name of a symref is invalid.
> +
>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> @@ -49,6 +55,9 @@
>  `emptyName`::
>  	(WARN) A path contains an empty name.
>  
> +`escapeReferent`::
> +	(ERROR) The referent of a symref is outside the "ref" directory.

I am not sure starting this as ERROR is wise.  Users and third-party
tools make creative uses of the system and I cannot offhand think of
an argument why it should be forbidden to create a symbolic link to
our own HEAD or to some worktree-specific ref in another worktree.

> +	size_t len = referent->len - 1;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!starts_with(referent->buf, "refs/")) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_ESCAPE_REFERENT,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	if (referent->buf[referent->len - 1] != '\n') {

As you initialized "len" to "referent->len-1" earlier, wouldn't it
more natural to use it here?  That would match the incrementing of
len++ later in this block.

> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");
> +		len++;
> +	}

Having said that, the above should be simplified more like:

 * declare but not initialize "len".  better yet, declare "orig_len"
   and leave it uninitialized.

 * do not touch "len++" in the above block (actually, you can
   discard the above "if(it does not end with LF)" block, see
   below).

 * instead grab "referent->len" in "len" (or "orig_len") immediately
   before you first modify referent, i.e. before strbuf_rtrim() call.

	orig_len = referent->len;
	orig_last_byte = referent->buf[orig_len - 1];

> +	strbuf_rtrim(referent);
> +	if (check_refname_format(referent->buf, 0)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_REFERENT_NAME,
> +				      "points to refname with invalid format");

Similar to an earlier step, the message does not give any more
information than the enum.  Wouldn't the user who got this error
want to learn what referent->buf said and which part of it was bad
in the same message, instead of having to look it up on their own
after fsck finishes?

> +		goto out;
> +	}

At this point we know check_refname_format() is happy with what is
left after rtrimming the referent.  There are four cases:

 - rtrim() did not trim anything (orig_len == referent->len); the file
   lacked the terminating LF.

 - rtrim() trimmed one byte (orig_len - 1 == referent->len) and
   the byte was not LF (orig_last_byte != '\n').  The file lacked
   the terminating LF.

 - rtrim() trimmed exactly one byte (orig_len - 1 == referent->len)
   and the byte was LF (orig_last_byte == '\n').  There is no error.

 - all other cases, i.e., rtrim() trimmed two or more bytes.  The
   file had trailing whitespaces after a valid referent that passed
   check_refname_format().

So in short,

	if (referent->len == orig_len ||
	    referent->len == orig_len - 1 && orig_last_byte != '\n') {
		FSCK_MSG_REF_MISSING_NEWLINE;
	} else if (referent->len < orig_len - 1) {
		FSCK_MSG_REF_TRAILING_WHITESPACE;
	}

can replace the next block you wrote, and we can also remove the
earlier "it is an error if it does not end with '\n'", I think.

> +	if (len != referent->len) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_TRAILING_REF_CONTENT,
> +				      "trailing garbage in ref");

As check_refname_format() was happy, the difference between orig_len
and referent->len are only coming from trailing whitespaces, i.e. it
is not that it had arbitrary garbage.  Shouldn't we be more explicit
about that?

> +	/*
> +	 * Dangling symrefs are common and so we don't report them.
> +	 */
> +	if (lstat(referent_path->buf, &st)) {
> +		if (errno != ENOENT) {
> +			ret = error_errno(_("unable to stat '%s'"),
> +					  referent_path->buf);
> +		}
> +		goto out;
> +	}
> +
> +	/*
> +	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
> +	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
> +	 * check the file type of the target.
> +	 */
> +	if (S_ISDIR(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_REFERENT_FILETYPE,
> +				      "points to the directory");
> +		goto out;
> +	}

If referent_path->buf refers to "refs/heads/historical/", and all
the branches under the hierarchy have been sent to packed-refs,
then this check will not trigger.

I wonder if this check is the right thing to enforce in the first
place, though.

As far as the end user is concerned, refs/heads/historical/master
branch stil exists, and there is no refs/heads/historical branch, so
such a symbolic ref, for all intents and purposes, is the same as
any other dangling symbolic refs, no?

Of course, "git update-ref SUCH_A_SYMREF HEAD" will complain because
there is refs/heads/historical, with something like 

    "refs/heads/historical/master" exists, cannot create "refs/heads/historical"

but that is to be expected.  If you remove the last branch in the
refs/heads/historical hierarchy, you should be able to do such an
update-ref to instanciate refs/heads/historical as a regular ref.

> @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  					      "trailing garbage in ref");
>  			goto cleanup;
>  		}
> +	} else {
> +		strbuf_addf(&referent_path, "%s/%s",
> +			    ref_store->gitdir, referent.buf);
> +		/*
> +		 * the referent may contain the spaces and the newline, need to
> +		 * trim for path.
> +		 */
> +		strbuf_rtrim(&referent_path);

I doubt this is a good design.  We have referent, and the symbolic
ref checker knows that the true referent refname may be followed by
whitespaces, so instead of inventing referent _path here, it would
be a better design to let the files_fsck_symref_target() to decide
what file to open and check based on referent, no?  Give it the
refstore or refstore's gitdir and have the concatenation with the
rtrimmed contents in the referent->buf after it inspected it
instead, perhaps?

> +		ret = files_fsck_symref_target(o, &report,
> +					       &referent,
> +					       &referent_path);

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 5/5] ref: add symlink ref content check for files backend
  2024-09-13 17:18         ` [PATCH v4 5/5] ref: add symlink ref " shejialuo
@ 2024-09-18 23:02           ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-09-18 23:02 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> Because we consider deprecating writing the symbolic links and for
> reading, we may or may not deprecate. We first need to asses whether
> symbolic links may still be used. So, add a new fsck message
> "symlinkRef(INFO)" to let the user be aware of this information.

If that is the intention, the the documentation entry is somewhat
out of line.

> +`symlinkRef`::
> +	(INFO) A symref uses the symbolic link. This kind of symref may
> +	be considered ERROR in the future when totally dropping the
> +	symlink support.

    A symbolic link is used as a symref.  Report to the
    git@vger.kernel.org mailing list if you see this error, as we
    are assessing the feasibility of dropping the support to use
    symbolic links as a symref.

But quite honestly, I do not think it is necessary to deprecate (let
alone remove) the support for reading side.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend
  2024-09-18 18:59           ` Junio C Hamano
@ 2024-09-22 14:58             ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-09-22 14:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Wed, Sep 18, 2024 at 11:59:45AM -0700, Junio C Hamano wrote:

[snip]

> The above reads as if you are, in preparation to "port" the checks
> we have in "fsck" to elsewhere (presumably to "refs verify"), you
> are removing the checks that _will_ become redundant from "fsck".
> 
> But that does not seem to be what is happening.  Let me try to
> paraphrase, in order to check my understanding of what you wanted to
> say:
> 
>     "git-fsck(1) has some consistency checks for regular refs.  As
>     we want to align the checks "git refs verify" performs with
>     them (and eventually call the unified code that checks refs from
>     both), port the logic "git fsck" has to "git refs verify".
> 

Thanks, I have re-read my words, I did not explain this thing well.

> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> > +		ret = error_errno(_("unable to read ref '%s/%s'"),
> > +				  refs_check_dir, iter->relative_path);
> 
> Is there a reason why we cannot to use report.path aka refname.buf,
> and instead we have to recompute the same path again?
> 

Thanks for pointing out this, because this part I wrote a long time ago
and I think it's unrelated to the fsck part. So, I forgot to change.

> Should this error be propagated back to the caller, not just to the
> end-user, by a call to fsck_report_ref(), like you do for a ref file
> that has questionable contents?  If ref iteration (like for-each-ref)
> claims there is this ref, and you cannot read its value when you try
> to use it, it is just as bad as having a loose ref file that has
> unusable contents, isn't it?
> 

I agree. The initial motivation for this design is that I think this is
OS-specific issue (It may be read successfully in the next time). So, I
don't put it into the fsck part. But It make senses that we should
report this.

> It is a separate matter if such a failure mode deserves its own
> error code (FSCK_MSG_UNREADABLE_REF) or can be rolled into the same
> FSCK_MSG_BAD_REF_CONTENT.  I can see arguments for both sides and
> offhand have no strong preference either way.
> 

We could just use "FSCK_MSG_BAD_REF_CONTENT" and add a message "cannot
open this file". I guess this should be enough.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 3/5] ref: add more strict checks for regular refs
  2024-09-18 19:39           ` Junio C Hamano
@ 2024-09-22 15:06             ` shejialuo
  2024-09-22 16:48               ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-22 15:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Wed, Sep 18, 2024 at 12:39:13PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > +`refMissingNewline`::
> > +	(INFO) A ref does not end with newline. This will be
> > +	considered an error in the future.
> 
> It is ONLY files backend's loose-ref representation to store the
> object name that is the value of the ref as hexadecimal text
> terminated with a newline.  With packed backend, even if the file
> ends with an incomplete line, it would be confusing to say that such
> lack of terminating LF is associated with a particular ref.  With
> reftable backend, the object name may not even be hexadecimal but
> binary without any terminating LF.
> 
> At least you should say "A loose ref file that does not end with...",
> because a ref NEVER ends or contains newline, and what you are
> expecting to be terminated with LF is not even a ref, but the value
> of it.
> 

Thanks, I will improve this in the next version.

> Also, isn't it too strong to say "will be" without giving any
> further information, like:
> 
>     As valid implementations of Git never created such a loose ref
>     file, it may become an error in the future.  Report to the
>     git@vger.kernel.org mailing list if you see this error, as we
>     need to know what tools created such a file.
> 
> or something?
> 

This is nice. I know the intention here.

> > +		if (!*trailing) {
> > +			ret = fsck_report_ref(o, &report,
> > +					      FSCK_MSG_REF_MISSING_NEWLINE,
> > +					      "missing newline");
> > +			goto cleanup;
> > +		}
> > +
> > +		if (*trailing != '\n' || *(trailing + 1)) {
> > +			ret = fsck_report_ref(o, &report,
> > +					      FSCK_MSG_TRAILING_REF_CONTENT,
> > +					      "trailing garbage in ref");
> > +			goto cleanup;
> > +		}
> 
> Not limited to this patch, but isn't fsck_report_ref() misdesigned,
> or is it just they are used poorly in these patches?  In these two
> callsites, the message string parameter does not give any more
> information than what the FSCK_MSG_* enum gives.
> 
> In fact, MSG_REF_MISSING_NEWLINE at least says that the complaint is
> about refs, but "missing newline" does not even say from what the
> newline is missing.  For TRAILING_REF_CONTENT, people may expect to
> see what garbage follows the expected contents, but that information
> (i.e. contents of *trailing) is lost here.

I agree with you here, I use way too general words to describe what
happens. I will improve this. Actually, I feel hard to find words for
"MSG_REF_MISSING_NEWLINE". I think we should say:

	LF should be at the end of the file.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 4/5] ref: add symref content check for files backend
  2024-09-18 20:19           ` Junio C Hamano
@ 2024-09-22 15:53             ` shejialuo
  2024-09-22 16:55               ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-22 15:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Wed, Sep 18, 2024 at 01:19:13PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> Expect that people do not read the body of the message as completing
> a paragrpah the title started.  I.e. ...
> 
> > We have already introduced the checks for regular refs. There is no need
> > to check the consistency of the target which the symref points to.
> > Instead, we just need to check the content of the symref itself.
> 
> ... this needs a bit of preamble, like
> 
>     We have code that check regular ref contents, but we do not yet
>     check contents of symbolic refs.
> 

Thanks, I will improve this in the next version.

> > A regular file is accepted as a textual symref if it begins with
> > "ref:", followed by zero or more whitespaces, followed by the full
> > refname, followed only by whitespace characters. We always write
> > a single SP after "ref:" and a single LF after the refname, but
> > third-party reimplementations of Git may have taken advantage of the
> > looser syntax. Put it more specific, we accept the following contents
> > of the symref:
> >
> > 1. "ref: refs/heads/master   "
> > 2. "ref: refs/heads/master   \n  \n"
> > 3. "ref: refs/heads/master\n\n"
> >
> > Thus, we could reuse "refMissingNewline" and "trailingRefContent"
> > FSCK_INFOs to do the same retroactive tightening as we introduce for
> > regular references.
> >
> > But we do not allow any other trailing garbage. The followings are bad
> > symref contents which will be reported as fsck error by "git-fsck(1)".
> 
> This description needs to be updated, as it is unclear if you are
> talking about errors we already detect, or if you are planning to
> update fsck to notice and report these errors.
> 

Yes, When I was writing this part, I felt a little painful to express my
words. I have thought how could I express the connection between the
current patch and the previous one.

> > And we will remember the untrimmed length of the "referent" and call
> > "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
> > to check whether the trimmed referent format is valid. If not, we will
> > report to the user that the symref points to referent which has invalid
> > format. If it is valid, we will compare the untrimmed length and trimmed
> > length, if they are not the same, we need to warn the user there is some
> > trailing garbage in the symref content.
> 
> That is an implementation detail of what you did.  But if the
> implementation were buggy and did not exactly what you intended to
> do, the above description gives no information to help others to fix
> it up so that it works as you intended it to work, because you do
> not explain it.
> 
> So what did you want to achieve in the third step (the first being
> "limit to refs/ hiararchy", the second being "no incomplete lines
> allowed")?
> 
>     Third, we want to make sure that the contents of a textual
>     symref MUST have a single LF after the target refname and
>     NOTHING ELSE.
> 
> or something.
> 

From the above comments, I need to organize the commit message of
this patch to make things clear here.

> "a directory" -> "an existing directory"?
> 
> I am not comfortable to see the word "directory" used in this
> proposed log message, as some refs could be stored in the packed
> backend and are referenced by the symbolic ref you are inspecting
> (this comment also refers to the "refs/ directory" you mentioned
> earlier as "the first check").
> 
>     Lastly, a symbolic ref MUST either point to an existing ref,
>     or if the referent does not exist, it MUST NOT be a leading
>     subpath for another existing ref (e.g., when "refs/heads/main"
>     exists, a symbolic ref that points at "refs/heads" is a no-no).
> 
> or something (but again, I am open to a phrasing better than
> "subpath").
> 
> Design question.  What do we want to do when we have no loose refs
> under the "refs/heads/historical/" hiearchy, (i.e. all of them are
> in packed-refs file) hence ".git/refs/heads/historical" directory
> does not exist on the filesystem.  And a symbolic ref points at
> "refs/heads/historical".  Shouldn't we give the same error whether
> the .git/refs/heads/historical directory exist or not, as long as
> the refs/heads/historical/main branch exists (in the packed-refs
> backend)?
> 

I guess I need to think carefully here. Actually, my intention is that I
want to concentrate on the loose refs and then take consideration about
the packed refs.

However, from what you have said above, it seems I could not do this.
They are connected. But at current, I am not so familiar with packed
refs behavior, I could not answer all the questions above.

I decide to understand what packed-ref done. So, this series may be
stalled sometime until I have a good knowledge and re-think the design
here.

> > +`escapeReferent`::
> > +	(ERROR) The referent of a symref is outside the "ref" directory.
> 
> I am not sure starting this as ERROR is wise.  Users and third-party
> tools make creative uses of the system and I cannot offhand think of
> an argument why it should be forbidden to create a symbolic link to
> our own HEAD or to some worktree-specific ref in another worktree.
> 

Do we allow this cross-access (hack)? It might cause some trouble from
my perspective.

> > +	if (referent->buf[referent->len - 1] != '\n') {
> 
> As you initialized "len" to "referent->len-1" earlier, wouldn't it
> more natural to use it here?  That would match the incrementing of
> len++ later in this block.
> 

Yes, exactly.

> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_REF_MISSING_NEWLINE,
> > +				      "missing newline");
> > +		len++;
> > +	}
> 
> Having said that, the above should be simplified more like:
> 
>  * declare but not initialize "len".  better yet, declare "orig_len"
>    and leave it uninitialized.
> 
>  * do not touch "len++" in the above block (actually, you can
>    discard the above "if(it does not end with LF)" block, see
>    below).
> 
>  * instead grab "referent->len" in "len" (or "orig_len") immediately
>    before you first modify referent, i.e. before strbuf_rtrim() call.
> 
> 	orig_len = referent->len;
> 	orig_last_byte = referent->buf[orig_len - 1];
> 

I agree.

> > +	strbuf_rtrim(referent);
> > +	if (check_refname_format(referent->buf, 0)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_REFERENT_NAME,
> > +				      "points to refname with invalid format");
> 
> Similar to an earlier step, the message does not give any more
> information than the enum.  Wouldn't the user who got this error
> want to learn what referent->buf said and which part of it was bad
> in the same message, instead of having to look it up on their own
> after fsck finishes?
> 

Yes, I agree. I will improve this.

> > +		goto out;
> > +	}
> 
> At this point we know check_refname_format() is happy with what is
> left after rtrimming the referent.  There are four cases:
> 
>  - rtrim() did not trim anything (orig_len == referent->len); the file
>    lacked the terminating LF.
> 
>  - rtrim() trimmed one byte (orig_len - 1 == referent->len) and
>    the byte was not LF (orig_last_byte != '\n').  The file lacked
>    the terminating LF.
> 
>  - rtrim() trimmed exactly one byte (orig_len - 1 == referent->len)
>    and the byte was LF (orig_last_byte == '\n').  There is no error.
> 
>  - all other cases, i.e., rtrim() trimmed two or more bytes.  The
>    file had trailing whitespaces after a valid referent that passed
>    check_refname_format().
> 

That's so clear. My implementation is not good compared with this.

> So in short,
> 
> 	if (referent->len == orig_len ||
> 	    referent->len == orig_len - 1 && orig_last_byte != '\n') {
> 		FSCK_MSG_REF_MISSING_NEWLINE;
> 	} else if (referent->len < orig_len - 1) {
> 		FSCK_MSG_REF_TRAILING_WHITESPACE;
> 	}
> 
> can replace the next block you wrote, and we can also remove the
> earlier "it is an error if it does not end with '\n'", I think.
> 
> > +	if (len != referent->len) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_TRAILING_REF_CONTENT,
> > +				      "trailing garbage in ref");
> 
> As check_refname_format() was happy, the difference between orig_len
> and referent->len are only coming from trailing whitespaces, i.e. it
> is not that it had arbitrary garbage.  Shouldn't we be more explicit
> about that?
> 

Yes, I made a lot of mistakes when calling the "fsck_report_ref". I will
report the exact garbage content to the user.

> > +	/*
> > +	 * Dangling symrefs are common and so we don't report them.
> > +	 */
> > +	if (lstat(referent_path->buf, &st)) {
> > +		if (errno != ENOENT) {
> > +			ret = error_errno(_("unable to stat '%s'"),
> > +					  referent_path->buf);
> > +		}
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
> > +	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
> > +	 * check the file type of the target.
> > +	 */
> > +	if (S_ISDIR(st.st_mode)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_REFERENT_FILETYPE,
> > +				      "points to the directory");
> > +		goto out;
> > +	}
> 
> If referent_path->buf refers to "refs/heads/historical/", and all
> the branches under the hierarchy have been sent to packed-refs,
> then this check will not trigger.
> 

Yes, because "refs/heads/historical" will not appear in the filesystem.

> I wonder if this check is the right thing to enforce in the first
> place, though.
> 
> As far as the end user is concerned, refs/heads/historical/master
> branch stil exists, and there is no refs/heads/historical branch, so
> such a symbolic ref, for all intents and purposes, is the same as
> any other dangling symbolic refs, no?
> 
> Of course, "git update-ref SUCH_A_SYMREF HEAD" will complain because
> there is refs/heads/historical, with something like 
> 
>     "refs/heads/historical/master" exists, cannot create "refs/heads/historical"
> 
> but that is to be expected.  If you remove the last branch in the
> refs/heads/historical hierarchy, you should be able to do such an
> update-ref to instanciate refs/heads/historical as a regular ref.
> 

I am a little shocked here. I do this in action and find the directory
will be automatically converted to a regular file in the filesystem. So,
I agree with you here. We should never check this, because we allow
symref to point to a directory. As long as there is no loose refs and
packed refs under this directory, we could use "git update-ref" for this
symref.

Thanks,

> > @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
> >  					      "trailing garbage in ref");
> >  			goto cleanup;
> >  		}
> > +	} else {
> > +		strbuf_addf(&referent_path, "%s/%s",
> > +			    ref_store->gitdir, referent.buf);
> > +		/*
> > +		 * the referent may contain the spaces and the newline, need to
> > +		 * trim for path.
> > +		 */
> > +		strbuf_rtrim(&referent_path);
> 
> I doubt this is a good design.  We have referent, and the symbolic
> ref checker knows that the true referent refname may be followed by
> whitespaces, so instead of inventing referent _path here, it would
> be a better design to let the files_fsck_symref_target() to decide
> what file to open and check based on referent, no?  Give it the
> refstore or refstore's gitdir and have the concatenation with the
> rtrimmed contents in the referent->buf after it inspected it
> instead, perhaps?
> 

Yes, I agree with you here. We should use "files_fsck_symref_target" to
do this.


----

From this review, I think I need to understand more behaviors about
files backend and packed backend. Thanks for your so dedicated reviews.
I may spend more time to send the next version. And there may be some
delay.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 3/5] ref: add more strict checks for regular refs
  2024-09-22 15:06             ` shejialuo
@ 2024-09-22 16:48               ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-09-22 16:48 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> I agree with you here, I use way too general words to describe what
> happens. I will improve this. Actually, I feel hard to find words for
> "MSG_REF_MISSING_NEWLINE". I think we should say:
>
> 	LF should be at the end of the file.

Giving a human-readable message when we have an enum can be done at
a lot higher layer with the current way the fsck_report_ref()
function is used (i.e. in that function, not by its callers).

That is what I meant by "misdesigned"---if one message enum always
corresponds to one human-readable message, there is not much point
in forcing callers to supply both, is there?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v4 4/5] ref: add symref content check for files backend
  2024-09-22 15:53             ` shejialuo
@ 2024-09-22 16:55               ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-09-22 16:55 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

>> > +`escapeReferent`::
>> > +	(ERROR) The referent of a symref is outside the "ref" directory.
>> 
>> I am not sure starting this as ERROR is wise.  Users and third-party
>> tools make creative uses of the system and I cannot offhand think of
>> an argument why it should be forbidden to create a symbolic link to
>> our own HEAD or to some worktree-specific ref in another worktree.
>> 
> Do we allow this cross-access (hack)? It might cause some trouble from
> my perspective.

If the current implementation allows users to set up and take
advantage of, then it is not a hack.  It would cause breakage
if we make it an error.  Does such a symref successfully refer
to the referent right now?  I think it does.

Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v5 0/9] add ref content check for files backend
  2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
                           ` (5 preceding siblings ...)
  2024-09-18 16:49         ` [PATCH v4 0/5] add " Junio C Hamano
@ 2024-09-29  7:13         ` shejialuo
  2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
                             ` (11 more replies)
  6 siblings, 12 replies; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:13 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This version handles a lot of review from Junio.

1. [PATCH v5 1/9] enhances the commit message compared with the previous
[PATCH v4 1/5].

2. [PATCH v5 2/9] is a new topic which has not never been introduced in
the previous. It supports multiple worktrees check for refs. During the
GSoC PATCH: <ZrSqMmD-quQ18a9F@ArchLinux.localdomain>, I do not implement
the code to support worktree check. However, we need to add this due to
the review from Junio:
  > > +`escapeReferent`::
  > > +	(ERROR) The referent of a symref is outside the "ref" directory.
  >
  > I am not sure starting this as ERROR is wise.  Users and third-party
  > tools make creative uses of the system and I cannot offhand think of
  > an argument why it should be forbidden to create a symbolic link to
  > our own HEAD or to some worktree-specific ref in another worktree.
When checking the escape situation of the referent, I didn't consider
the worktree. So, I decide to first add checks for multiple worktree.
And then add a new test for multiple worktrees.

3. The intention of the [PATCH v5 3/9] is the same as the [PATCH v4
2/5].

  + Enhance the commit message suggested by Junio.
  + Use "fsck_ref_report" to tell the user we cannot read the file
  instead of reporting general error.
  + For "FSCK_MSG_BAD_REF_CONTENT" message id, instead of just reporting
  the no-information message "invalid ref content", report the actual
  content of the ref, i.e., "ref_content.buf".

4. The intention of the [PATCH v5 4/9] is the same as the [PATCH v4
3/5].

  + Instead of using the concrete "refMissingNewline" and
  "trailingRefContent" fsck messages, create a fsck info message
  "unofficialFormattedRef"
  + Follow the advice from Junio, use "fsck_ref_report" to report more
  useful information. For example, what is the trailing garbage.

5. The PATCH[v4 4/5] is split into 4 commits from [PATCH v5 5/9] to
[PATCH v5 8/9]. The reason why I decide to do this is that I introduce
the check for worktree and the version 4 is a little messy for the
commit message. Although the C code is not changed too much, the commit
message is hard to write and make the reviewer confused.

6. [PATCH v5 5/9] aims to add checks for textual symref except escape
situation.

  + Because I split commit here, it's easy to write the clean commit
  message, which should be changed according to the review from Junio.
  + Followed the advice from Junio to gracefully check the symref. Thus,
  the commit message is more clean.
  + Drop the check for "referent" pointing to a directory. We allow
  this, it's a dangling symref. No need to check this. So we could drop
  the parameter "referent_path" in "files_fsck_symref_target()".
  + Enhance the "fsck_ref_report" to report more useful information.

7. [PATCH v5 6/9] enhances the check for escape situation. Introduce a
new fsck message "escapeReferent(INFO)".

8. [PATCH v5 7/9] enhances the situation where we use multiple
worktrees. In practice, we allow point to ref of one of the linked
worktrees from primary worktree or one of the linked worktrees. We
should not warn about this.

9. [PATCH v5 8/9] enhances the test script for worktrees.

10. The intention of [PATCH v5 9/9] is the same as the [PATCH v4 5/5].
Not so much change.

Because I do not sync the upstream for a long time. For this series, I
sync the latest upstream and generate the patch, it is based on

  3857aae53f (Git 2.47-rc0, 2024-09-25)

And I don't think range-diff is useful, it is messy for the reviewers.
Actually, there are not so many logic changes in this new version.

Thanks,
Jialuo

shejialuo (9):
  ref: initialize "fsck_ref_report" with zero
  builtin/refs: support multiple worktrees check for refs.
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add basic symref content check for files backend
  ref: add escape check for the referent of symref
  ref: enhance escape situation for worktrees
  t0602: add ref content checks for worktrees
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  28 +++
 builtin/refs.c                |  11 +-
 fsck.h                        |   5 +
 refs.c                        |   2 +-
 refs/files-backend.c          | 168 ++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 442 ++++++++++++++++++++++++++++++++++
 7 files changed, 646 insertions(+), 12 deletions(-)

-- 
2.46.2


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
@ 2024-09-29  7:15           ` shejialuo
  2024-10-08  7:29             ` Karthik Nayak
  2024-09-29  7:15           ` [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs shejialuo
                             ` (10 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:15 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0824c0b8a9..03d2503276 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,7 +3520,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs.
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
  2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-09-29  7:15           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
                             ` (9 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:15 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. As we decide to add more
checks for ref content, we need to set up support for multiple
worktrees. Use "get_worktrees" and "get_worktree_ref_store" to check
refs under the worktrees.

Because we should only check once for "packed-refs", let's call the fsck
function for packed-backend when in the main worktree. In order to know
which directory we check, we should default print this information
instead of specifying "--verbose".

It's not suitable to print these information to the stderr. So, change
to stdout.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c           | 11 ++++++--
 refs/files-backend.c     | 18 ++++++++----
 t/t0602-reffiles-fsck.sh | 59 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/builtin/refs.c b/builtin/refs.c
index 24978a7b7b..3c492ea922 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "refs.h"
 #include "strbuf.h"
+#include "worktree.h"
 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
@@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 {
 	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	struct worktree **worktrees, **p;
 	const char * const verify_usage[] = {
 		REFS_VERIFY_USAGE,
 		NULL,
@@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
 		OPT_END(),
 	};
-	int ret;
+	int ret = 0;
 
 	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
 	if (argc)
@@ -84,9 +86,14 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+	worktrees = get_worktrees();
+	for (p = worktrees; *p; p++) {
+		struct worktree *wt = *p;
+		ret += refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options);
+	}
 
 	fsck_options_clear(&fsck_refs_options);
+	free_worktrees(worktrees);
 	return ret;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 03d2503276..57318b4c4e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3558,7 +3558,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
 			if (o->verbose)
-				fprintf_ln(stderr, "Checking %s/%s",
+				fprintf_ln(stdout, "Checking %s/%s",
 					   refs_check_dir, iter->relative_path);
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
 				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
@@ -3589,8 +3589,8 @@ static int files_fsck_refs(struct ref_store *ref_store,
 		NULL,
 	};
 
-	if (o->verbose)
-		fprintf_ln(stderr, _("Checking references consistency"));
+	fprintf_ln(stdout, _("Checking references consistency in %s"),
+		   ref_store->gitdir);
 	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
 }
 
@@ -3600,8 +3600,16 @@ static int files_fsck(struct ref_store *ref_store,
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	return files_fsck_refs(ref_store, o) |
-	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+	int ret = files_fsck_refs(ref_store, o);
+
+	/*
+	 * packed-refs should only be checked once because it is shared
+	 * between all worktrees.
+	 */
+	if (!strcmp(ref_store->gitdir, ref_store->repo->gitdir))
+		ret += refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+
+	return ret;
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..4c6cd6f7d0 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,63 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'ref name check should work for multiple worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	cd repo &&
+	test_commit initial &&
+	git checkout -b branch-1 &&
+	test_commit second &&
+	git checkout -b branch-2 &&
+	test_commit third &&
+	git checkout -b branch-3 &&
+	git worktree add ./worktree-1 branch-1 &&
+	git worktree add ./worktree-2 branch-2 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/.branch-2 &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/@ &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/worktree/.branch-2: badRefName: invalid refname format
+	error: refs/worktree/@: badRefName: invalid refname format
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err &&
+
+	(
+		cd worktree-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/worktree/.branch-2: badRefName: invalid refname format
+		error: refs/worktree/@: badRefName: invalid refname format
+		EOF
+		sort err >sorted_err &&
+		test_cmp expect sorted_err
+	) &&
+
+	(
+		cd worktree-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/worktree/.branch-2: badRefName: invalid refname format
+		error: refs/worktree/@: badRefName: invalid refname format
+		EOF
+		sort err >sorted_err &&
+		test_cmp expect sorted_err
+	)
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
  2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-09-29  7:15           ` [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs shejialuo
@ 2024-09-29  7:15           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-08  7:43             ` Karthik Nayak
  2024-09-29  7:16           ` [PATCH v5 4/9] ref: add more strict checks for regular refs shejialuo
                             ` (8 subsequent siblings)
  11 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:15 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

"git-fsck(1)" has some consistency checks for regular refs. As we want
to align the checks "git refs verify" performs with them (and eventually
call the unified code that checks refs from both), port the logic
"git-fsck" has to "git refs verify".

"git-fsck(1)" will report an error when the ref content is invalid.
Following this, add a similar check to "git refs verify". Then add a new
fsck error message "badRefContent(ERROR)" to represent that a ref has an
invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 45 ++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 66 +++++++++++++++++++++++++++++++++++
 4 files changed, 115 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 57318b4c4e..35b3fa983e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3504,6 +3504,50 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "cannot read ref file");
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3586,6 +3630,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 4c6cd6f7d0..628f9bcc46 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -148,4 +148,70 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 	)
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	bad_content=$(git rev-parse main)x &&
+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	bad_content=xfsazqfxcadas &&
+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	bad_content=Xfsazqfxcadas &&
+	printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 4/9] ref: add more strict checks for regular refs
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (2 preceding siblings ...)
  2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-09-29  7:16           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-29  7:16           ` [PATCH v5 5/9] ref: add basic symref content check for files backend shejialuo
                             ` (7 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:16 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So we add the "unofficialFormattedRef(INFO)"
fsck message to represent the situation where the ref format is not
officially created by us and notify the users it may become an error in
the future.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 +++++
 fsck.h                        |  1 +
 refs.c                        |  2 +-
 refs/files-backend.c          | 26 +++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 59 +++++++++++++++++++++++++++++++++++
 6 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..e310b5bce9 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -179,6 +179,14 @@
 `unknownType`::
 	(ERROR) Found an unknown object type.
 
+`unofficialFormattedRef`::
+	(INFO) The content of a loose ref file is not in the official
+	format such as not having a LF at the end or having trailing
+	garbage. As valid implementations of Git never created such a
+	loose ref file, it may become an error in the future. Report
+	to the git@vger.kernel.org mailing list if you see this error,
+	as we need to know what tools created such a file.
+
 `unterminatedHeader`::
 	(FATAL) Missing end-of-line in the object header.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..7420add5c0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(UNOFFICIAL_FORMATTED_REF, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 5f729ed412..6ba1bb1aa1 100644
--- a/refs.c
+++ b/refs.c
@@ -1788,7 +1788,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 35b3fa983e..b2a790c884 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -568,7 +568,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -605,7 +605,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -627,6 +627,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3533,7 +3538,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		strbuf_rtrim(&ref_content);
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
@@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+					      "misses LF at the end");
+			goto cleanup;
+		}
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+					      "has trailing garbage: '%s'", trailing);
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 628f9bcc46..2f5c4a1926 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -185,6 +185,61 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
 	EOF
 	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-1: unofficialFormattedRef: has trailing garbage: '\''
+
+
+	'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-1 &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-2: unofficialFormattedRef: has trailing garbage: '\''
+
+
+	  garbage'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-2 &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-3: unofficialFormattedRef: has trailing garbage: '\''    garbage
+	a'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-3 &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	test_must_fail git -c fsck.unofficialFormattedRef=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-garbage-4: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-4 &&
 	test_cmp expect err
 '
 
@@ -203,12 +258,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
 	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
 	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 5/9] ref: add basic symref content check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (3 preceding siblings ...)
  2024-09-29  7:16           ` [PATCH v5 4/9] ref: add more strict checks for regular refs shejialuo
@ 2024-09-29  7:16           ` shejialuo
  2024-10-08  7:58             ` Karthik Nayak
  2024-09-29  7:16           ` [PATCH v5 6/9] ref: add escape check for the referent of symref shejialuo
                             ` (6 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:16 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular could
be accepted as a textual symref if it begins with "ref:", followed by
zero or more whitespaces, followed by the full refname, followed only by
whitespace characters. However, we always write a single SP after "ref:"
and a single LF after the refname. It may seem that we should report a
fsck error message when the "referent" does not apply above rules and we
should not be so aggressive because third-party reimplementations of Git
may have taken advantage of the looser syntax. Put it more specific, we
accept the following "referent":

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created a new fsck
message "unofficialFormattedRef" which exactly represents above
situation. So we will reuse this fsck message to write checks to info
the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferent(ERROR)" fsck message to report above
errors by using "ref.c::check_refname_format". But we cannot just pass
the "referent" to this function because the "referent" might contain
some whitespaces which will cause "check_refname_format" failing.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 40 +++++++++++++++
 t/t0602-reffiles-fsck.sh      | 97 +++++++++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index e310b5bce9..e0e4519334 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferent`::
+	(ERROR) The referent of a ref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 7420add5c0..979d75cb53 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b2a790c884..57ac466b64 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3508,6 +3508,43 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
@@ -3559,6 +3596,9 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 2f5c4a1926..718f6abb71 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -273,4 +273,101 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 6/9] ref: add escape check for the referent of symref
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (4 preceding siblings ...)
  2024-09-29  7:16           ` [PATCH v5 5/9] ref: add basic symref content check for files backend shejialuo
@ 2024-09-29  7:16           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-29  7:17           ` [PATCH v5 7/9] ref: enhance escape situation for worktrees shejialuo
                             ` (5 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:16 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpeudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the "escape" situation, create
a new fsck message "escapeReferent" to notify the user that this may
become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 ++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          |  7 +++++++
 t/t0602-reffiles-fsck.sh      | 18 ++++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index e0e4519334..223974057d 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -52,6 +52,14 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(INFO) The referent of a symref is outside the "ref" directory.
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
diff --git a/fsck.h b/fsck.h
index 979d75cb53..5ecee0fda5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -80,6 +80,7 @@ enum fsck_msg_type {
 	FUNC(LARGE_PATHNAME, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(BAD_FILEMODE, INFO) \
+	FUNC(ESCAPE_REFERENT, INFO) \
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
 	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 57ac466b64..bd215c8d08 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
+	if (!starts_with(referent->buf, "refs/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_ESCAPE_REFERENT,
+				      "referent '%s' is outside of refs/",
+				      referent->buf);
+	}
+
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 718f6abb71..585f562245 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -370,4 +370,22 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref should be checked whether it is escaped' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 7/9] ref: enhance escape situation for worktrees
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (5 preceding siblings ...)
  2024-09-29  7:16           ` [PATCH v5 6/9] ref: add escape check for the referent of symref shejialuo
@ 2024-09-29  7:17           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-29  7:17           ` [PATCH v5 8/9] t0602: add ref content checks " shejialuo
                             ` (4 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We do allow users to use "git symbolic-ref" to create symrefs which
point to one of the linked worktrees from the primary worktree or one of
the linked worktrees.

We should not info the user about the escape for above situation. So,
enhance "files_fsck_symref_target" function to check whether the "referent"
starts with the "worktrees/" to make sure that we won't warn the user
when symrefs point to "referent" in the linked worktrees.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     |  5 +++--
 t/t0602-reffiles-fsck.sh | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index bd215c8d08..1182bca108 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,10 +3520,11 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
-	if (!starts_with(referent->buf, "refs/")) {
+	if (!starts_with(referent->buf, "refs/") &&
+	    !starts_with(referent->buf, "worktrees/")) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_ESCAPE_REFERENT,
-				      "referent '%s' is outside of refs/",
+				      "referent '%s' is outside of refs/ or worktrees/",
 				      referent->buf);
 	}
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 585f562245..936448f780 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -382,10 +382,42 @@ test_expect_success 'textual symref should be checked whether it is escaped' '
 	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/
+	warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/ or worktrees/
 	EOF
 	rm $branch_dir_prefix/branch-bad-1 &&
 	test_cmp expect err
 '
 
+test_expect_success 'textual symref escape check should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+
+	(
+		cd worktree-1 &&
+		git branch refs/worktree/w1-branch &&
+		git symbolic-ref refs/worktree/branch-4 refs/heads/branch-1 &&
+		git symbolic-ref refs/worktree/branch-5 worktrees/worktree-2/refs/worktree/w2-branch
+	) &&
+	(
+		cd worktree-2 &&
+		git branch refs/worktree/w2-branch &&
+		git symbolic-ref refs/worktree/branch-4 refs/heads/branch-1 &&
+		git symbolic-ref refs/worktree/branch-5 worktrees/worktree-1/refs/worktree/w1-branch
+	) &&
+
+
+	git symbolic-ref refs/heads/branch-5 worktrees/worktree-1/refs/worktree/w1-branch &&
+	git symbolic-ref refs/heads/branch-6 worktrees/worktree-2/refs/worktree/w2-branch &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 8/9] t0602: add ref content checks for worktrees
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (6 preceding siblings ...)
  2024-09-29  7:17           ` [PATCH v5 7/9] ref: enhance escape situation for worktrees shejialuo
@ 2024-09-29  7:17           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-29  7:17           ` [PATCH v5 9/9] ref: add symlink ref content check for files backend shejialuo
                             ` (3 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already added content tests, but we don't have tests when there
are worktrees in the repository. Add a new test to test all the
functionalities we have added for worktrees.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 t/t0602-reffiles-fsck.sh | 66 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 936448f780..97bbcd3f13 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -420,4 +420,70 @@ test_expect_success 'textual symref escape check should work with worktrees' '
 	test_must_be_empty err
 '
 
+test_expect_success 'all textual symref checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	bad_content_1=$(git rev-parse HEAD)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+
+	printf "%s" $bad_content_1 >$worktree1_refdir_prefix/bad-branch-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/worktree/bad-branch-1: badRefContent: $bad_content_1
+	EOF
+	rm $worktree1_refdir_prefix/bad-branch-1 &&
+	test_cmp expect err &&
+
+	printf "%s" $bad_content_2 >$worktree2_refdir_prefix/bad-branch-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/worktree/bad-branch-2: badRefContent: $bad_content_2
+	EOF
+	rm $worktree2_refdir_prefix/bad-branch-2 &&
+	test_cmp expect err &&
+
+	printf "%s" $bad_content_3 >$worktree1_refdir_prefix/bad-branch-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/worktree/bad-branch-3: badRefContent: $bad_content_3
+	EOF
+	rm $worktree1_refdir_prefix/bad-branch-3 &&
+	test_cmp expect err &&
+
+	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/worktree/branch-no-newline: unofficialFormattedRef: misses LF at the end
+	EOF
+	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree2_refdir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/worktree/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $worktree2_refdir_prefix/branch-garbage
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v5 9/9] ref: add symlink ref content check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (7 preceding siblings ...)
  2024-09-29  7:17           ` [PATCH v5 8/9] t0602: add ref content checks " shejialuo
@ 2024-09-29  7:17           ` shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
  2024-09-30 18:57           ` [PATCH v5 0/9] add " Junio C Hamano
                             ` (2 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-09-29  7:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

We firstly use the "strbuf_add_real_path" to resolve the symlink and
get the absolute path as the "ref_content" which the symlink ref points
to. Then we can use the absolute "abs_gitdir" of the "gitdir" and then
combine "ref_content" and "abs_gitdir" to extract the relative path
"referent". If "ref_content" is outside of "gitdir", we just use the
"ref_content" as the "referent". Thus, we can reuse
"files_fsck_symref_target" function to seamlessly check the symlink
refs.

Because we consider deprecating writing the symbolic links. We first
need to asses whether symbolic links may still be used. So, add a new
fsck message "symlinkRef(INFO)" to tell the user be aware of this
information.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  6 +++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 43 ++++++++++++++++++++++++++++-----
 t/t0602-reffiles-fsck.sh      | 45 +++++++++++++++++++++++++++++++++++
 4 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 223974057d..ffe9d6a2f6 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -184,6 +184,12 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`symlinkRef`::
+	(INFO) A symbolic link is used as a symref.  Report to the
+	git@vger.kernel.org mailing list if you see this error, as we
+	are assessing the feasibility of dropping the support to drop
+	creating symblinks as symrefs.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 5ecee0fda5..f1da5c8a77 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(UNOFFICIAL_FORMATTED_REF, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1182bca108..5a5327a146 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../config.h"
 #include "../copy.h"
 #include "../environment.h"
@@ -3510,15 +3511,18 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    struct strbuf *referent)
+				    struct strbuf *referent,
+				    unsigned int symbolic_link)
 {
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
 
-	orig_len = referent->len;
-	orig_last_byte = referent->buf[orig_len - 1];
-	strbuf_rtrim(referent);
+	if (!symbolic_link) {
+		orig_len = referent->len;
+		orig_last_byte = referent->buf[orig_len - 1];
+		strbuf_rtrim(referent);
+	}
 
 	if (!starts_with(referent->buf, "refs/") &&
 	    !starts_with(referent->buf, "worktrees/")) {
@@ -3535,6 +3539,9 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
+	if (symbolic_link)
+		goto out;
+
 
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
@@ -3559,6 +3566,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
@@ -3571,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
 	report.path = refname.buf;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path = NULL;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&ref_content, iter->path.buf);
+		skip_prefix(ref_content.buf, abs_gitdir.buf,
+			    &relative_referent_path);
+
+		if (relative_referent_path)
+			strbuf_addstr(&referent, relative_referent_path);
+		else
+			strbuf_addbuf(&referent, &ref_content);
+
+		ret += files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = fsck_report_ref(o, &report,
@@ -3605,7 +3635,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 			goto cleanup;
 		}
 	} else {
-		ret = files_fsck_symref_target(o, &report, &referent);
+		ret = files_fsck_symref_target(o, &report, &referent, 0);
 		goto cleanup;
 	}
 
@@ -3613,6 +3643,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 97bbcd3f13..be4c064b3c 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -486,4 +486,49 @@ test_expect_success 'all textual symref checks should work with worktrees' '
 	rm $worktree2_refdir_prefix/branch-garbage
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: escapeReferent: referent '\''logs/branch-escape'\'' is outside of refs/ or worktrees/
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferent: points to invalid refname '\''refs/heads/branch   space'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferent: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 0/9] add ref content check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (8 preceding siblings ...)
  2024-09-29  7:17           ` [PATCH v5 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-09-30 18:57           ` Junio C Hamano
  2024-10-01  3:40             ` shejialuo
  2024-10-07 12:49           ` shejialuo
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
  11 siblings, 1 reply; 209+ messages in thread
From: Junio C Hamano @ 2024-09-30 18:57 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak

shejialuo <shejialuo@gmail.com> writes:

> Because I do not sync the upstream for a long time. For this series, I
> sync the latest upstream and generate the patch, it is based on
>
>   3857aae53f (Git 2.47-rc0, 2024-09-25)

Does this help to reduce conflicts when merging the topic to say
'next' or 'seen'?  If so, such a rebase and noting it in the cover
letter message, like you just did, is very much appreciated.

If not, please don't ;-).

> And I don't think range-diff is useful, it is messy for the reviewers.
> Actually, there are not so many logic changes in this new version.

OK, so this needs a fresh full review.  Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 0/9] add ref content check for files backend
  2024-09-30 18:57           ` [PATCH v5 0/9] add " Junio C Hamano
@ 2024-10-01  3:40             ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-01  3:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Karthik Nayak

On Mon, Sep 30, 2024 at 11:57:19AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > Because I do not sync the upstream for a long time. For this series, I
> > sync the latest upstream and generate the patch, it is based on
> >
> >   3857aae53f (Git 2.47-rc0, 2024-09-25)
> 
> Does this help to reduce conflicts when merging the topic to say
> 'next' or 'seen'?  If so, such a rebase and noting it in the cover
> letter message, like you just did, is very much appreciated.
> 
> If not, please don't ;-).
> 

Actually, I am sure that there is no conflicts after squashing the
following two patches.

  <xmqqle0gzdyh.fsf_-_@gitster.g>
  <xmqqbk1cz69c.fsf@gitster.g>

The reason why I just sync the upstream is that the build system (such
as warning about unused parameters) and CIs are all changed.

I will remember this.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs.
  2024-09-29  7:15           ` [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:42               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:15:26PM +0800, shejialuo wrote:
> We have already set up the infrastructure to check the consistency for
> refs, but we do not support multiple worktrees. As we decide to add more
> checks for ref content, we need to set up support for multiple
> worktrees. Use "get_worktrees" and "get_worktree_ref_store" to check
> refs under the worktrees.

Makes sense.

> Because we should only check once for "packed-refs", let's call the fsck
> function for packed-backend when in the main worktree. In order to know
> which directory we check, we should default print this information
> instead of specifying "--verbose".

This change should likely be evicted into its own commit with a bit more
explanation.

> It's not suitable to print these information to the stderr. So, change
> to stdout.

This one, too. Why exactly is in not suitable to print to stderr?

> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  builtin/refs.c           | 11 ++++++--
>  refs/files-backend.c     | 18 ++++++++----
>  t/t0602-reffiles-fsck.sh | 59 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin/refs.c b/builtin/refs.c
> index 24978a7b7b..3c492ea922 100644
> --- a/builtin/refs.c
> +++ b/builtin/refs.c
> @@ -5,6 +5,7 @@
>  #include "parse-options.h"
>  #include "refs.h"
>  #include "strbuf.h"
> +#include "worktree.h"
>  
>  #define REFS_MIGRATE_USAGE \
>  	N_("git refs migrate --ref-format=<format> [--dry-run]")
> @@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
>  static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
>  {
>  	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
> +	struct worktree **worktrees, **p;
>  	const char * const verify_usage[] = {
>  		REFS_VERIFY_USAGE,
>  		NULL,
> @@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
>  		OPT_END(),
>  	};
> -	int ret;
> +	int ret = 0;
>  
>  	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
>  	if (argc)
> @@ -84,9 +86,14 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
>  	git_config(git_fsck_config, &fsck_refs_options);
>  	prepare_repo_settings(the_repository);
>  
> -	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
> +	worktrees = get_worktrees();
> +	for (p = worktrees; *p; p++) {
> +		struct worktree *wt = *p;
> +		ret += refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options);
> +	}

I think it is more customary to say `ret |=` instead of `ref +=`.
Otherwise we could at least in theory wrap around and even land at `ret
== 0`, even though this is quite unlikely.

>  	fsck_options_clear(&fsck_refs_options);
> +	free_worktrees(worktrees);
>  	return ret;
>  }
>  
[snip]
> @@ -3600,8 +3600,16 @@ static int files_fsck(struct ref_store *ref_store,
>  	struct files_ref_store *refs =
>  		files_downcast(ref_store, REF_STORE_READ, "fsck");
>  
> -	return files_fsck_refs(ref_store, o) |
> -	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
> +	int ret = files_fsck_refs(ref_store, o);
> +
> +	/*
> +	 * packed-refs should only be checked once because it is shared
> +	 * between all worktrees.
> +	 */
> +	if (!strcmp(ref_store->gitdir, ref_store->repo->gitdir))
> +		ret += refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
> +
> +	return ret;
>  }
>  
>  struct ref_storage_be refs_be_files = {

What is the current behaviour? Is it that we verify the packed-refs file
multiple times, or rather that we call `packed_ref_store->be->fsck()`
many times even though we know it won't do anything for anything except
for the main worktree?

If it is the former I very much agree that we should make this
conditional. If it's the latter I'm more in the camp of letting it be
such that if worktrees were to ever gain support for "packed-refs" we
wouldn't have to change anything.

In any case, as proposed I think it would make sense to evict this into
a standalone commit such that these details can be explained in the
commit message.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:42               ` shejialuo
  2024-10-08  7:43             ` Karthik Nayak
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:15:46PM +0800, shejialuo wrote:
> "git-fsck(1)" has some consistency checks for regular refs. As we want
> to align the checks "git refs verify" performs with them (and eventually
> call the unified code that checks refs from both), port the logic
> "git-fsck" has to "git refs verify".

What's missing here is the actual intent of this commit, namely why we
want to align the checks. I assume that this prepares us for calling
`git refs verify` as part of git-fsck(1), but readers not familiar with
the larger picture may be left wondering.

> "git-fsck(1)" will report an error when the ref content is invalid.
> Following this, add a similar check to "git refs verify". Then add a new
> fsck error message "badRefContent(ERROR)" to represent that a ref has an
> invalid content.

It would help readers to know where the code is that you're porting over
to `git refs verify` so that one can double check that the port is done
faithfully to the original.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 4/9] ref: add more strict checks for regular refs
  2024-09-29  7:16           ` [PATCH v5 4/9] ref: add more strict checks for regular refs shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:44               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote:
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 22c385ea22..e310b5bce9 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -179,6 +179,14 @@
>  `unknownType`::
>  	(ERROR) Found an unknown object type.
>  
> +`unofficialFormattedRef`::
> +	(INFO) The content of a loose ref file is not in the official
> +	format such as not having a LF at the end or having trailing
> +	garbage. As valid implementations of Git never created such a
> +	loose ref file, it may become an error in the future. Report
> +	to the git@vger.kernel.org mailing list if you see this error,
> +	as we need to know what tools created such a file.
> +

I find "unofficial" to be a tad weird. Do we rather want to say
something like "badRefTrailingGarbage"?

> @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  		goto cleanup;
>  	}
>  
> +	if (!(type & REF_ISSYMREF)) {
> +		if (!*trailing) {
> +			ret = fsck_report_ref(o, &report,
> +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> +					      "misses LF at the end");
> +			goto cleanup;
> +		}
> +		if (*trailing != '\n' || *(trailing + 1)) {
> +			ret = fsck_report_ref(o, &report,
> +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> +					      "has trailing garbage: '%s'", trailing);
> +			goto cleanup;
> +		}
> +	}
> +

I think we should discern these two error cases and provide different
message IDs.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 7/9] ref: enhance escape situation for worktrees
  2024-09-29  7:17           ` [PATCH v5 7/9] ref: enhance escape situation for worktrees shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:45               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:17:01PM +0800, shejialuo wrote:
> We do allow users to use "git symbolic-ref" to create symrefs which
> point to one of the linked worktrees from the primary worktree or one of
> the linked worktrees.
> 
> We should not info the user about the escape for above situation. So,
> enhance "files_fsck_symref_target" function to check whether the "referent"
> starts with the "worktrees/" to make sure that we won't warn the user
> when symrefs point to "referent" in the linked worktrees.

Shouldn't this commit be squashed into the former one, as it immediately
fixes an edge case that was introduced with the parent commit?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 8/9] t0602: add ref content checks for worktrees
  2024-09-29  7:17           ` [PATCH v5 8/9] t0602: add ref content checks " shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:45               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:17:18PM +0800, shejialuo wrote:
> We have already added content tests, but we don't have tests when there
> are worktrees in the repository. Add a new test to test all the
> functionalities we have added for worktrees.

I'd squash this commit into the one where you introduced checks for
worktrees. Or if this exercises errors that you have added in subsequent
commits I'd squash it into the respective commit that introduces those
checks.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 9/9] ref: add symlink ref content check for files backend
  2024-09-29  7:17           ` [PATCH v5 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:45               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:17:36PM +0800, shejialuo wrote:
> We have already introduced "files_fsck_symref_target". We should reuse
> this function to handle the symrefs which use legacy symbolic links. We
> should not check the trailing garbage for symbolic refs. Add a new
> parameter "symbolic_link" to disable some checks which should only be
> executed for textual symrefs.

You're getting into implementation details before noting what the actual
problem is. So I'd recommend first describing the problem at a higher
level, and then note that we can reuse parts of preexisting infra to
address the issue.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 6/9] ref: add escape check for the referent of symref
  2024-09-29  7:16           ` [PATCH v5 6/9] ref: add escape check for the referent of symref shejialuo
@ 2024-10-07  6:58             ` Patrick Steinhardt
  2024-10-07  8:44               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  6:58 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> Ideally, we want to the users use "git symbolic-ref" to create symrefs
> instead of writing raw contents into the filesystem. However, "git
> symbolic-ref" is strict with the refname but not strict with the
> referent. For example, we can make the "referent" located at the
> "$(gitdir)/logs/aaa" and manually write the content into this where we
> can still successfully parse this symref by using "git rev-parse".
> 
>   $ git init repo && cd repo && git commit --allow-empty -mx
>   $ git symbolic-ref refs/heads/test logs/aaa
>   $ echo $(git rev-parse HEAD) > .git/logs/aaa
>   $ git rev-parse test

Oh, curious. This should definitely be tightened in git-symbolic-ref(1)
itself. The target should either be a root ref or something starting
with "refs/". Anyway, that is of course outside of the scope of this
patch series.

> We may need to add some restrictions for "referent" parameter when using
> "git symbolic-ref" to create symrefs because ideally all the
> nonpeudo-refs should be located under the "refs" directory and we may
> tighten this in the future.

Agreed.

> In order to tell the user we may tighten the "escape" situation, create
> a new fsck message "escapeReferent" to notify the user that this may
> become an error in the future.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |  8 ++++++++
>  fsck.h                        |  1 +
>  refs/files-backend.c          |  7 +++++++
>  t/t0602-reffiles-fsck.sh      | 18 ++++++++++++++++++
>  4 files changed, 34 insertions(+)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index e0e4519334..223974057d 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -52,6 +52,14 @@
>  `emptyName`::
>  	(WARN) A path contains an empty name.
>  
> +`escapeReferent`::
> +	(INFO) The referent of a symref is outside the "ref" directory.

Proposal: 'The referent of a symbolic reference points neither to a root
reference nor to a reference starting with "refs/".'

I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
like that, because it's not really about whether or not the referent is
"escaping". It's a bit of a mouthful, but I don't really have a better
name. So feel free to pick something different that describes the error
better.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 57ac466b64..bd215c8d08 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o,
>  	orig_last_byte = referent->buf[orig_len - 1];
>  	strbuf_rtrim(referent);
>  
> +	if (!starts_with(referent->buf, "refs/")) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_ESCAPE_REFERENT,
> +				      "referent '%s' is outside of refs/",
> +				      referent->buf);
> +	}
> +
>  	if (check_refname_format(referent->buf, 0)) {
>  		ret = fsck_report_ref(o, report,
>  				      FSCK_MSG_BAD_REFERENT,

This check is invalid, because referents can also point to root refs. So
you should probably also add a call to `is_root_ref()` here.

We also have `is_pseudo_ref()`, and one might be tempted to also allow
that. But pseudo refs aren't proper refs, so I'd argue that a symref
pointing to a pseudo ref is invalid, too.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs.
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:42               ` shejialuo
  2024-10-07  9:16                 ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:30AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:15:26PM +0800, shejialuo wrote:
> > We have already set up the infrastructure to check the consistency for
> > refs, but we do not support multiple worktrees. As we decide to add more
> > checks for ref content, we need to set up support for multiple
> > worktrees. Use "get_worktrees" and "get_worktree_ref_store" to check
> > refs under the worktrees.
> 
> Makes sense.
> 
> > Because we should only check once for "packed-refs", let's call the fsck
> > function for packed-backend when in the main worktree. In order to know
> > which directory we check, we should default print this information
> > instead of specifying "--verbose".
> 
> This change should likely be evicted into its own commit with a bit more
> explanation.
> 
> > It's not suitable to print these information to the stderr. So, change
> > to stdout.
> 
> This one, too. Why exactly is in not suitable to print to stderr?
> 

I am sorry for the confusion. We should not print which directory we
check here into stderr. Because I think this will make test script
contain many unrelated info when using "git refs verify 2>err".

The reason here is when checking the consistency of refs in multiple
worktrees. The ref name could be repeat. For example, worktree A
has its own ref called "test" under ".git/worktrees/A/refs/worktree/test"
and worktree B has its own ref still called "test" under
".git/worktrees/B/refs/worktree/test".

However, the refname would be printed to "refs/worktree/test". It will
make the user confused which "refs/worktree/test" is checked. So, we
should print this information like:

    Checking references consistency in .git
    ...
    checking references consistency in .git/worktrees/A
    ...
    checking references consistency in .git/worktrees/B

However, when writing this, I feel a ".git" is a bad usage. It will make
the user think it will check everything here. This should be improved in
the next version.

> > @@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
> >  		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
> >  		OPT_END(),
> >  	};
> > -	int ret;
> > +	int ret = 0;
> >  
> >  	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
> >  	if (argc)
> > @@ -84,9 +86,14 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
> >  	git_config(git_fsck_config, &fsck_refs_options);
> >  	prepare_repo_settings(the_repository);
> >  
> > -	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
> > +	worktrees = get_worktrees();
> > +	for (p = worktrees; *p; p++) {
> > +		struct worktree *wt = *p;
> > +		ret += refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options);
> > +	}
> 
> I think it is more customary to say `ret |=` instead of `ref +=`.
> Otherwise we could at least in theory wrap around and even land at `ret
> == 0`, even though this is quite unlikely.
> 

I agree here. I will improve this in the next version.

[snip]

> > @@ -3600,8 +3600,16 @@ static int files_fsck(struct ref_store *ref_store,
> >  	struct files_ref_store *refs =
> >  		files_downcast(ref_store, REF_STORE_READ, "fsck");
> >  
> > -	return files_fsck_refs(ref_store, o) |
> > -	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
> > +	int ret = files_fsck_refs(ref_store, o);
> > +
> > +	/*
> > +	 * packed-refs should only be checked once because it is shared
> > +	 * between all worktrees.
> > +	 */
> > +	if (!strcmp(ref_store->gitdir, ref_store->repo->gitdir))
> > +		ret += refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
> > +
> > +	return ret;
> >  }
> >  
> >  struct ref_storage_be refs_be_files = {
> 
> What is the current behaviour? Is it that we verify the packed-refs file
> multiple times, or rather that we call `packed_ref_store->be->fsck()`
> many times even though we know it won't do anything for anything except
> for the main worktree?
> 

That's a good question. I think the second is the current behaviour. We
will call `packed_ref_store->be->fsck()` many times. I understand what
you mean here, we just put the check into `packed_ref_store->be->fsck()`
function.

> If it is the former I very much agree that we should make this
> conditional. If it's the latter I'm more in the camp of letting it be
> such that if worktrees were to ever gain support for "packed-refs" we
> wouldn't have to change anything.
> 

I agree.

> In any case, as proposed I think it would make sense to evict this into
> a standalone commit such that these details can be explained in the
> commit message.
> 

Yes, the current commit message lacks of details.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:42               ` shejialuo
  2024-10-07  9:18                 ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:34AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:15:46PM +0800, shejialuo wrote:
> > "git-fsck(1)" has some consistency checks for regular refs. As we want
> > to align the checks "git refs verify" performs with them (and eventually
> > call the unified code that checks refs from both), port the logic
> > "git-fsck" has to "git refs verify".
> 
> What's missing here is the actual intent of this commit, namely why we
> want to align the checks. I assume that this prepares us for calling
> `git refs verify` as part of git-fsck(1), but readers not familiar with
> the larger picture may be left wondering.
> 

Indeed, I will improve this in the next version.

> > "git-fsck(1)" will report an error when the ref content is invalid.
> > Following this, add a similar check to "git refs verify". Then add a new
> > fsck error message "badRefContent(ERROR)" to represent that a ref has an
> > invalid content.
> 
> It would help readers to know where the code is that you're porting over
> to `git refs verify` so that one can double check that the port is done
> faithfully to the original.
> 

I am a little confused here. There are too many codes in "git-fsck(1)"
to check the ref consistency. How could I accurately express this info
in the commit message?

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 4/9] ref: add more strict checks for regular refs
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:44               ` shejialuo
  2024-10-07  9:25                 ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:37AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote:
> > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > index 22c385ea22..e310b5bce9 100644
> > --- a/Documentation/fsck-msgids.txt
> > +++ b/Documentation/fsck-msgids.txt
> > @@ -179,6 +179,14 @@
> >  `unknownType`::
> >  	(ERROR) Found an unknown object type.
> >  
> > +`unofficialFormattedRef`::
> > +	(INFO) The content of a loose ref file is not in the official
> > +	format such as not having a LF at the end or having trailing
> > +	garbage. As valid implementations of Git never created such a
> > +	loose ref file, it may become an error in the future. Report
> > +	to the git@vger.kernel.org mailing list if you see this error,
> > +	as we need to know what tools created such a file.
> > +
> 
> I find "unofficial" to be a tad weird. Do we rather want to say
> something like "badRefTrailingGarbage"?
> 

Well, I will answer this question just in below question together.

> > @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
> >  		goto cleanup;
> >  	}
> >  
> > +	if (!(type & REF_ISSYMREF)) {
> > +		if (!*trailing) {
> > +			ret = fsck_report_ref(o, &report,
> > +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> > +					      "misses LF at the end");
> > +			goto cleanup;
> > +		}
> > +		if (*trailing != '\n' || *(trailing + 1)) {
> > +			ret = fsck_report_ref(o, &report,
> > +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> > +					      "has trailing garbage: '%s'", trailing);
> > +			goto cleanup;
> > +		}
> > +	}
> > +
> 
> I think we should discern these two error cases and provide different
> message IDs.
> 

Actually, in the previous versions, I have mapped one message id to one
error case. But, in the v4, Junio asked a question

  Not limited to this patch, but isn't fsck_report_ref() misdesigned,
  or is it just they are used poorly in these patches?  In these two
  callsites, the message string parameter does not give any more
  information than what the FSCK_MSG_* enum gives.

  That is what I meant by "misdesigned"---if one message enum always
  corresponds to one human-readable message, there is not much point
  in forcing callers to supply both, is there?

In my opinion, we should have only one case here for trailing garbage
and not end with a newline. When writing the code, I chose the name
"unofficialFormattedRef" for the following reason:

  1. If we use two message ids here, for every message id, we need write
  to info the user "please report this to git mailing list".

  2. If we decide to make this as an error. We could just classify them
  into "badRefContent" message category.

  3. The semantic is correct here, they are truly curious formatted
  refs, and eventually we will give the info to the user what is
  curious.

So, I think we should not always map one message to one error case.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 6/9] ref: add escape check for the referent of symref
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:44               ` shejialuo
  2024-10-07  9:26                 ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> > Ideally, we want to the users use "git symbolic-ref" to create symrefs
> > instead of writing raw contents into the filesystem. However, "git
> > symbolic-ref" is strict with the refname but not strict with the
> > referent. For example, we can make the "referent" located at the
> > "$(gitdir)/logs/aaa" and manually write the content into this where we
> > can still successfully parse this symref by using "git rev-parse".
> > 
> >   $ git init repo && cd repo && git commit --allow-empty -mx
> >   $ git symbolic-ref refs/heads/test logs/aaa
> >   $ echo $(git rev-parse HEAD) > .git/logs/aaa
> >   $ git rev-parse test
> 
> Oh, curious. This should definitely be tightened in git-symbolic-ref(1)
> itself. The target should either be a root ref or something starting
> with "refs/". Anyway, that is of course outside of the scope of this
> patch series.
> 

I am curious here too when I did experiments when writing the code.
Because Junio have told me this could happen, so I dive into this.
However, it's not reasonable. If we want to tighten the rule, we need to
also let "git symbolic-ref" to align with the behavior. That's another
question though.

[snip]

> > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > index e0e4519334..223974057d 100644
> > --- a/Documentation/fsck-msgids.txt
> > +++ b/Documentation/fsck-msgids.txt
> > @@ -52,6 +52,14 @@
> >  `emptyName`::
> >  	(WARN) A path contains an empty name.
> >  
> > +`escapeReferent`::
> > +	(INFO) The referent of a symref is outside the "ref" directory.
> 
> Proposal: 'The referent of a symbolic reference points neither to a root
> reference nor to a reference starting with "refs/".'
> 

That's much better.

> I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
> like that, because it's not really about whether or not the referent is
> "escaping". It's a bit of a mouthful, but I don't really have a better
> name. So feel free to pick something different that describes the error
> better.
> 

I guess "symrefTargetIsNotAReference" is a little too long. If we decide
to convert it to error later. Why not just put it into the "badReferent"
fsck message?

So, I do not think we need to rename. As I have talked about, we don't
need to map error case to fsck message id one by one.

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 57ac466b64..bd215c8d08 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o,
> >  	orig_last_byte = referent->buf[orig_len - 1];
> >  	strbuf_rtrim(referent);
> >  
> > +	if (!starts_with(referent->buf, "refs/")) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_ESCAPE_REFERENT,
> > +				      "referent '%s' is outside of refs/",
> > +				      referent->buf);
> > +	}
> > +
> >  	if (check_refname_format(referent->buf, 0)) {
> >  		ret = fsck_report_ref(o, report,
> >  				      FSCK_MSG_BAD_REFERENT,
> 
> This check is invalid, because referents can also point to root refs. So
> you should probably also add a call to `is_root_ref()` here.
> 

Thanks, I omit this situation here.

> We also have `is_pseudo_ref()`, and one might be tempted to also allow
> that. But pseudo refs aren't proper refs, so I'd argue that a symref
> pointing to a pseudo ref is invalid, too.
> 

I agree.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 7/9] ref: enhance escape situation for worktrees
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:45               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:40AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:17:01PM +0800, shejialuo wrote:
> > We do allow users to use "git symbolic-ref" to create symrefs which
> > point to one of the linked worktrees from the primary worktree or one of
> > the linked worktrees.
> > 
> > We should not info the user about the escape for above situation. So,
> > enhance "files_fsck_symref_target" function to check whether the "referent"
> > starts with the "worktrees/" to make sure that we won't warn the user
> > when symrefs point to "referent" in the linked worktrees.
> 
> Shouldn't this commit be squashed into the former one, as it immediately
> fixes an edge case that was introduced with the parent commit?
> 

I partially agree here. I don't think this is an edge case that was
introduced with the parent commit. The reason why I use a new commit
here is that I want to emphasis the behavior.

This is because Junio asked me in the v4 about "escapeReferent"

  I am not sure starting this as ERROR is wise.  Users and third-party
  tools make creative uses of the system and I cannot offhand think of
  an argument why it should be forbidden to create a symbolic link to
  our own HEAD or to some worktree-specific ref in another worktree.

Actually, I have never thought we could do this. So, this is my
intention. But I do agree that this commit is highly relevant with the
parent commit.

I will improve this in the next version.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 8/9] t0602: add ref content checks for worktrees
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:45               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:43AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:17:18PM +0800, shejialuo wrote:
> > We have already added content tests, but we don't have tests when there
> > are worktrees in the repository. Add a new test to test all the
> > functionalities we have added for worktrees.
> 
> I'd squash this commit into the one where you introduced checks for
> worktrees. Or if this exercises errors that you have added in subsequent
> commits I'd squash it into the respective commit that introduces those
> checks.
> 

Yes, make sense. I will improve this in the next version.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 9/9] ref: add symlink ref content check for files backend
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-07  8:45               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07  8:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 08:58:50AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:17:36PM +0800, shejialuo wrote:
> > We have already introduced "files_fsck_symref_target". We should reuse
> > this function to handle the symrefs which use legacy symbolic links. We
> > should not check the trailing garbage for symbolic refs. Add a new
> > parameter "symbolic_link" to disable some checks which should only be
> > executed for textual symrefs.
> 
> You're getting into implementation details before noting what the actual
> problem is. So I'd recommend first describing the problem at a higher
> level, and then note that we can reuse parts of preexisting infra to
> address the issue.
> 

Thanks, I will improve this in the next version.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs.
  2024-10-07  8:42               ` shejialuo
@ 2024-10-07  9:16                 ` Patrick Steinhardt
  2024-10-07 12:06                   ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  9:16 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 04:42:21PM +0800, shejialuo wrote:
> On Mon, Oct 07, 2024 at 08:58:30AM +0200, Patrick Steinhardt wrote:
> > On Sun, Sep 29, 2024 at 03:15:26PM +0800, shejialuo wrote:
> > > We have already set up the infrastructure to check the consistency for
> > > refs, but we do not support multiple worktrees. As we decide to add more
> > > checks for ref content, we need to set up support for multiple
> > > worktrees. Use "get_worktrees" and "get_worktree_ref_store" to check
> > > refs under the worktrees.
> > 
> > Makes sense.
> > 
> > > Because we should only check once for "packed-refs", let's call the fsck
> > > function for packed-backend when in the main worktree. In order to know
> > > which directory we check, we should default print this information
> > > instead of specifying "--verbose".
> > 
> > This change should likely be evicted into its own commit with a bit more
> > explanation.
> > 
> > > It's not suitable to print these information to the stderr. So, change
> > > to stdout.
> > 
> > This one, too. Why exactly is in not suitable to print to stderr?
> > 
> 
> I am sorry for the confusion. We should not print which directory we
> check here into stderr. Because I think this will make test script
> contain many unrelated info when using "git refs verify 2>err".
> 
> The reason here is when checking the consistency of refs in multiple
> worktrees. The ref name could be repeat. For example, worktree A
> has its own ref called "test" under ".git/worktrees/A/refs/worktree/test"
> and worktree B has its own ref still called "test" under
> ".git/worktrees/B/refs/worktree/test".
> 
> However, the refname would be printed to "refs/worktree/test". It will
> make the user confused which "refs/worktree/test" is checked. So, we
> should print this information like:
> 
>     Checking references consistency in .git
>     ...
>     checking references consistency in .git/worktrees/A
>     ...
>     checking references consistency in .git/worktrees/B
> 
> However, when writing this, I feel a ".git" is a bad usage. It will make
> the user think it will check everything here. This should be improved in
> the next version.

But wouldn't it be the better solution if we printed the fully-qualified
reference name "worktrees/worktree/refs/worktree/test" instead? That
would remove the need to say which directory we're currently verifying
in the first place.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-07  8:42               ` shejialuo
@ 2024-10-07  9:18                 ` Patrick Steinhardt
  2024-10-07 12:08                   ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  9:18 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 04:42:44PM +0800, shejialuo wrote:
> On Mon, Oct 07, 2024 at 08:58:34AM +0200, Patrick Steinhardt wrote:
> > On Sun, Sep 29, 2024 at 03:15:46PM +0800, shejialuo wrote:
> > > "git-fsck(1)" will report an error when the ref content is invalid.
> > > Following this, add a similar check to "git refs verify". Then add a new
> > > fsck error message "badRefContent(ERROR)" to represent that a ref has an
> > > invalid content.
> > 
> > It would help readers to know where the code is that you're porting over
> > to `git refs verify` so that one can double check that the port is done
> > faithfully to the original.
> > 
> 
> I am a little confused here. There are too many codes in "git-fsck(1)"
> to check the ref consistency. How could I accurately express this info
> in the commit message?

Well, you say you ported over a specific consistency check from
git-fsck(1) to `git refs verify` in the commit message. So I assume that
it should match a specific check in git-fsck(1), shouldn't it?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 4/9] ref: add more strict checks for regular refs
  2024-10-07  8:44               ` shejialuo
@ 2024-10-07  9:25                 ` Patrick Steinhardt
  2024-10-07 12:19                   ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  9:25 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 04:44:16PM +0800, shejialuo wrote:
> On Mon, Oct 07, 2024 at 08:58:37AM +0200, Patrick Steinhardt wrote:
> > On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote:
> > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > > index 22c385ea22..e310b5bce9 100644
> > > --- a/Documentation/fsck-msgids.txt
> > > +++ b/Documentation/fsck-msgids.txt
> > > @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
> > >  		goto cleanup;
> > >  	}
> > >  
> > > +	if (!(type & REF_ISSYMREF)) {
> > > +		if (!*trailing) {
> > > +			ret = fsck_report_ref(o, &report,
> > > +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> > > +					      "misses LF at the end");
> > > +			goto cleanup;
> > > +		}
> > > +		if (*trailing != '\n' || *(trailing + 1)) {
> > > +			ret = fsck_report_ref(o, &report,
> > > +					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
> > > +					      "has trailing garbage: '%s'", trailing);
> > > +			goto cleanup;
> > > +		}
> > > +	}
> > > +
> > 
> > I think we should discern these two error cases and provide different
> > message IDs.
> > 
> 
> Actually, in the previous versions, I have mapped one message id to one
> error case. But, in the v4, Junio asked a question
> 
>   Not limited to this patch, but isn't fsck_report_ref() misdesigned,
>   or is it just they are used poorly in these patches?  In these two
>   callsites, the message string parameter does not give any more
>   information than what the FSCK_MSG_* enum gives.
> 
>   That is what I meant by "misdesigned"---if one message enum always
>   corresponds to one human-readable message, there is not much point
>   in forcing callers to supply both, is there?
> 
> In my opinion, we should have only one case here for trailing garbage
> and not end with a newline. When writing the code, I chose the name
> "unofficialFormattedRef" for the following reason:
> 
>   1. If we use two message ids here, for every message id, we need write
>   to info the user "please report this to git mailing list".
> 
>   2. If we decide to make this as an error. We could just classify them
>   into "badRefContent" message category.
> 
>   3. The semantic is correct here, they are truly curious formatted
>   refs, and eventually we will give the info to the user what is
>   curious.
> 
> So, I think we should not always map one message to one error case.

From my point of view the error codes should be the single source of
truth, as this is what a user can use to disable specific checks. So if
one code maps to multiple messages they have the problem that they can
only disable all of those messages.

I don't disagree with what Junio is saying. It is somewhat duplicate
that the user has to pass both a code and a message in the current
form-- it should be sufficient for them to pass the code, and the
message can then e.g. be extracted from a central array that maps codes
to messages.

But you can also make the reverse argument: messages can be dynamic, so
that the caller may include additional details around why specfically
the check failed. The code and message would still be 1:1, but we may
include additional details like that to guide the user.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 6/9] ref: add escape check for the referent of symref
  2024-10-07  8:44               ` shejialuo
@ 2024-10-07  9:26                 ` Patrick Steinhardt
  0 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-07  9:26 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 04:44:44PM +0800, shejialuo wrote:
> On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote:
> > On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> > I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
> > like that, because it's not really about whether or not the referent is
> > "escaping". It's a bit of a mouthful, but I don't really have a better
> > name. So feel free to pick something different that describes the error
> > better.
> > 
> 
> I guess "symrefTargetIsNotAReference" is a little too long. If we decide
> to convert it to error later. Why not just put it into the "badReferent"
> fsck message?
> 
> So, I do not think we need to rename. As I have talked about, we don't
> need to map error case to fsck message id one by one.

Mostly because I disagree with this here. I think there should be a 1:1
mapping, and "badReferent" is too generic to provide that.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs.
  2024-10-07  9:16                 ` Patrick Steinhardt
@ 2024-10-07 12:06                   ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07 12:06 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 11:16:19AM +0200, Patrick Steinhardt wrote:

[snip]

> > However, the refname would be printed to "refs/worktree/test". It will
> > make the user confused which "refs/worktree/test" is checked. So, we
> > should print this information like:
> > 
> >     Checking references consistency in .git
> >     ...
> >     checking references consistency in .git/worktrees/A
> >     ...
> >     checking references consistency in .git/worktrees/B
> > 
> > However, when writing this, I feel a ".git" is a bad usage. It will make
> > the user think it will check everything here. This should be improved in
> > the next version.
> 
> But wouldn't it be the better solution if we printed the fully-qualified
> reference name "worktrees/worktree/refs/worktree/test" instead? That
> would remove the need to say which directory we're currently verifying
> in the first place.
> 

Good idea. I will use this way in the next version.

> Patrick

Thanks

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-07  9:18                 ` Patrick Steinhardt
@ 2024-10-07 12:08                   ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07 12:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 11:18:24AM +0200, Patrick Steinhardt wrote:
> On Mon, Oct 07, 2024 at 04:42:44PM +0800, shejialuo wrote:
> > On Mon, Oct 07, 2024 at 08:58:34AM +0200, Patrick Steinhardt wrote:
> > > On Sun, Sep 29, 2024 at 03:15:46PM +0800, shejialuo wrote:
> > > > "git-fsck(1)" will report an error when the ref content is invalid.
> > > > Following this, add a similar check to "git refs verify". Then add a new
> > > > fsck error message "badRefContent(ERROR)" to represent that a ref has an
> > > > invalid content.
> > > 
> > > It would help readers to know where the code is that you're porting over
> > > to `git refs verify` so that one can double check that the port is done
> > > faithfully to the original.
> > > 
> > 
> > I am a little confused here. There are too many codes in "git-fsck(1)"
> > to check the ref consistency. How could I accurately express this info
> > in the commit message?
> 
> Well, you say you ported over a specific consistency check from
> git-fsck(1) to `git refs verify` in the commit message. So I assume that
> it should match a specific check in git-fsck(1), shouldn't it?
> 

I understand your meaning here. I will improve the commit message in the
next version.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 4/9] ref: add more strict checks for regular refs
  2024-10-07  9:25                 ` Patrick Steinhardt
@ 2024-10-07 12:19                   ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07 12:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 07, 2024 at 11:25:17AM +0200, Patrick Steinhardt wrote:

[snip]

> > 
> > Actually, in the previous versions, I have mapped one message id to one
> > error case. But, in the v4, Junio asked a question
> > 
> >   Not limited to this patch, but isn't fsck_report_ref() misdesigned,
> >   or is it just they are used poorly in these patches?  In these two
> >   callsites, the message string parameter does not give any more
> >   information than what the FSCK_MSG_* enum gives.
> > 
> >   That is what I meant by "misdesigned"---if one message enum always
> >   corresponds to one human-readable message, there is not much point
> >   in forcing callers to supply both, is there?
> > 
> > In my opinion, we should have only one case here for trailing garbage
> > and not end with a newline. When writing the code, I chose the name
> > "unofficialFormattedRef" for the following reason:
> > 
> >   1. If we use two message ids here, for every message id, we need write
> >   to info the user "please report this to git mailing list".
> > 
> >   2. If we decide to make this as an error. We could just classify them
> >   into "badRefContent" message category.
> > 
> >   3. The semantic is correct here, they are truly curious formatted
> >   refs, and eventually we will give the info to the user what is
> >   curious.
> > 
> > So, I think we should not always map one message to one error case.
> 
> From my point of view the error codes should be the single source of
> truth, as this is what a user can use to disable specific checks. So if
> one code maps to multiple messages they have the problem that they can
> only disable all of those messages.
> 

Thanks for your remind here. I totally forgot this. I have changed my
mind now, we should use one to one mapping here. As you said, if we do
not, we will give the user the bad experience.

> I don't disagree with what Junio is saying. It is somewhat duplicate
> that the user has to pass both a code and a message in the current
> form-- it should be sufficient for them to pass the code, and the
> message can then e.g. be extracted from a central array that maps codes
> to messages.
> 
> But you can also make the reverse argument: messages can be dynamic, so
> that the caller may include additional details around why specfically
> the check failed. The code and message would still be 1:1, but we may
> include additional details like that to guide the user.
> 

Yes, I will refactor the "fsck_report" to allow the user pass the "NULL"
message if the fsck message id is clear enough to indicate the error
case.

So, more things to do here.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 0/9] add ref content check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (9 preceding siblings ...)
  2024-09-30 18:57           ` [PATCH v5 0/9] add " Junio C Hamano
@ 2024-10-07 12:49           ` shejialuo
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-07 12:49 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

From the discussion with Patrick in v5 and Junio in v4. I conclude the
follow things:

1. "fsck_ref_report" should not be refactored to accept `NULL`. There
would be only one situation where it will be a little bad (the content
of a ref does not end with a newline). In the other situations, the
message part will be useful, such as:

  refs/heads/garbage-branch: trailingRefContent: ' garbage'.
  refs/heads/escape: escapeReferent: referent 'xxx' is outside.

Although for some messages, only use fsck message id is enough. But we
could also specify the message. It's not harmful anyway.

2. The mapping from fsck message id to error case should be one to one.
This is essentially important because the user could set the fsck error
levels. If we use multiple to one, we will give the user a bad
experience. We should avoid this.

I will wait for more comments to ensure the next version will be better.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero
  2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-10-08  7:29             ` Karthik Nayak
  0 siblings, 0 replies; 209+ messages in thread
From: Karthik Nayak @ 2024-10-08  7:29 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
> "referent" is NULL. So, we need to always initialize these parameters to
> NULL instead of letting them point to anywhere when creating a new
> "fsck_ref_report" structure.
>
> The original code explicitly initializes the "path" member in the
> "struct fsck_ref_report" to NULL (which implicitly 0-initializes other
> members in the struct). It is more customary to use "{ 0 }" to express
> that we are 0-initializing everything. In order to align with the the

s/the//

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
  2024-10-07  6:58             ` Patrick Steinhardt
@ 2024-10-08  7:43             ` Karthik Nayak
  2024-10-08 12:24               ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Karthik Nayak @ 2024-10-08  7:43 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

shejialuo <shejialuo@gmail.com> writes:

[snip]

> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "cannot read ref file");
> +		goto cleanup;
> +	}
> +

Shouldn't we use `die_errno` here instead? I mean, this is not really a
bad ref content issue. If we don't want to die here, it would still
probably be nice to get the actual issue using `strerror` instead and
use that instead of the generic message we have here.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 5/9] ref: add basic symref content check for files backend
  2024-09-29  7:16           ` [PATCH v5 5/9] ref: add basic symref content check for files backend shejialuo
@ 2024-10-08  7:58             ` Karthik Nayak
  2024-10-08 12:18               ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Karthik Nayak @ 2024-10-08  7:58 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> We have code that checks regular ref contents, but we do not yet check
> the contents of symbolic refs. By using "parse_loose_ref_content" for
> symbolic refs, we will get the information of the "referent".
>
> We do not need to check the "referent" by opening the file. This is
> because if "referent" exists in the file system, we will eventually
> check its correctness by inspecting every file in the "refs" directory.
> If the "referent" does not exist in the filesystem, this is OK as it is
> seen as the dangling symref.
>
> So we just need to check the "referent" string content. A regular could

seems like we're missing the noun here, a regular what?

> be accepted as a textual symref if it begins with "ref:", followed by
> zero or more whitespaces, followed by the full refname, followed only by
> whitespace characters. However, we always write a single SP after "ref:"
> and a single LF after the refname. It may seem that we should report a
> fsck error message when the "referent" does not apply above rules and we
> should not be so aggressive because third-party reimplementations of Git
> may have taken advantage of the looser syntax. Put it more specific, we
> accept the following "referent":
>
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
>
> When introducing the regular ref content checks, we created a new fsck
> message "unofficialFormattedRef" which exactly represents above
> situation. So we will reuse this fsck message to write checks to info
> the user about these situations.
>

Plus to what Patrick said in the previous commit, it would be nice to
separate these issues with different message IDs.

> But we do not allow any other trailing garbage. The followings are bad
> symref contents which will be reported as fsck error by "git-fsck(1)".
>
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
>
> And we introduce a new "badReferent(ERROR)" fsck message to report above
> errors by using "ref.c::check_refname_format". But we cannot just pass
> the "referent" to this function because the "referent" might contain
> some whitespaces which will cause "check_refname_format" failing.
>

It would be nice if you could elaborate here, or rather restructure to
say something like..

    Since 'check_refname_format' doesn't work with whitespaces, we use
    the trimmed version of 'referent' with the function.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 5/9] ref: add basic symref content check for files backend
  2024-10-08  7:58             ` Karthik Nayak
@ 2024-10-08 12:18               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-08 12:18 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Oct 08, 2024 at 12:58:16AM -0700, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We have code that checks regular ref contents, but we do not yet check
> > the contents of symbolic refs. By using "parse_loose_ref_content" for
> > symbolic refs, we will get the information of the "referent".
> >
> > We do not need to check the "referent" by opening the file. This is
> > because if "referent" exists in the file system, we will eventually
> > check its correctness by inspecting every file in the "refs" directory.
> > If the "referent" does not exist in the filesystem, this is OK as it is
> > seen as the dangling symref.
> >
> > So we just need to check the "referent" string content. A regular could
> 
> seems like we're missing the noun here, a regular what?
> 

It should be "a regular ref". I copied the original commit message and
may carelessly type "daw" in vim to delete the "ref". Thanks.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-08  7:43             ` Karthik Nayak
@ 2024-10-08 12:24               ` shejialuo
  2024-10-08 17:44                 ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-08 12:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Oct 08, 2024 at 12:43:20AM -0700, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> [snip]
> 
> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> > +		ret = fsck_report_ref(o, &report,
> > +				      FSCK_MSG_BAD_REF_CONTENT,
> > +				      "cannot read ref file");
> > +		goto cleanup;
> > +	}
> > +
> 
> Shouldn't we use `die_errno` here instead? I mean, this is not really a
> bad ref content issue. If we don't want to die here, it would still
> probably be nice to get the actual issue using `strerror` instead and
> use that instead of the generic message we have here.
> 

Well, I think I need to dive into the "open" system call here. Actually,
we have two opinions now. Junio thought that we should use
"fsck_report_ref" to report. Karthik, Patrick and I thought that we
should report using "*errno" because this is a general error.

Let me investigate what situations will make "open" system call fail.
Thus, we could fully explain which choice we will choose.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-08 12:24               ` shejialuo
@ 2024-10-08 17:44                 ` Junio C Hamano
  2024-10-09  8:05                   ` Patrick Steinhardt
  2024-10-09 11:55                   ` shejialuo
  0 siblings, 2 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-10-08 17:44 UTC (permalink / raw)
  To: shejialuo; +Cc: Karthik Nayak, git, Patrick Steinhardt

shejialuo <shejialuo@gmail.com> writes:

> On Tue, Oct 08, 2024 at 12:43:20AM -0700, Karthik Nayak wrote:
>> shejialuo <shejialuo@gmail.com> writes:
>> 
>> [snip]
>> 
>> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
>> > +		ret = fsck_report_ref(o, &report,
>> > +				      FSCK_MSG_BAD_REF_CONTENT,
>> > +				      "cannot read ref file");
>> > +		goto cleanup;
>> > +	}
>> > +
>> 
>> Shouldn't we use `die_errno` here instead? I mean, this is not really a
>> bad ref content issue. If we don't want to die here, it would still
>> probably be nice to get the actual issue using `strerror` instead and
>> use that instead of the generic message we have here.
>> 
>
> Well, I think I need to dive into the "open" system call here. Actually,
> we have two opinions now. Junio thought that we should use
> "fsck_report_ref" to report. Karthik, Patrick and I thought that we
> should report using "*errno" because this is a general error.

What do you mean by "a general error"?  It is true that we failed to
read a ref file, so even if it is an I/O error, I'd think it is OK
to report it as an error while reading one particular ref.

Giving more information is a separate issue.  If fsck_report_ref()
can be extended to take something like

    "cannot read ref file '%s': (%s)", iter->path.buf, strerror(errno)

that would give the user necessary information.

And I agree with half-of what Karthik said, i.e., we do not want to
die here if this is meant to run as a part of "git fsck".

I may have said this before, but quite frankly, the API into the
fsck_report_ref() function is misdesigned.  If the single constant
string "cannot read ref file" cnanot give more information than
FSCK_MSG_BAD_REF_CONTENT, the parameter has no value.

The fsck.c:report() function, which is the main function to report
fsck's findings before fsck_report_ref() was introduced, did not
have such a problem, as it allowed "const char *fmt, ..." at the
end.  Is it too late to fix the fsck_report_ref()?

Thanks.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-08 17:44                 ` Junio C Hamano
@ 2024-10-09  8:05                   ` Patrick Steinhardt
  2024-10-09 11:59                     ` shejialuo
  2024-10-09 11:55                   ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-09  8:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: shejialuo, Karthik Nayak, git

On Tue, Oct 08, 2024 at 10:44:53AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > On Tue, Oct 08, 2024 at 12:43:20AM -0700, Karthik Nayak wrote:
> >> shejialuo <shejialuo@gmail.com> writes:
> >> 
> >> [snip]
> >> 
> >> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> >> > +		ret = fsck_report_ref(o, &report,
> >> > +				      FSCK_MSG_BAD_REF_CONTENT,
> >> > +				      "cannot read ref file");
> >> > +		goto cleanup;
> >> > +	}
> >> > +
> >> 
> >> Shouldn't we use `die_errno` here instead? I mean, this is not really a
> >> bad ref content issue. If we don't want to die here, it would still
> >> probably be nice to get the actual issue using `strerror` instead and
> >> use that instead of the generic message we have here.
> >> 
> >
> > Well, I think I need to dive into the "open" system call here. Actually,
> > we have two opinions now. Junio thought that we should use
> > "fsck_report_ref" to report. Karthik, Patrick and I thought that we
> > should report using "*errno" because this is a general error.
> 
> What do you mean by "a general error"?  It is true that we failed to
> read a ref file, so even if it is an I/O error, I'd think it is OK
> to report it as an error while reading one particular ref.
> 
> Giving more information is a separate issue.  If fsck_report_ref()
> can be extended to take something like
> 
>     "cannot read ref file '%s': (%s)", iter->path.buf, strerror(errno)
> 
> that would give the user necessary information.

Yeah, this is also in line with what I proposed elsewhere, where we have
been discussing the 1:1 mapping between error codes and error messages.
If the error messages were dynamic they'd be a whole lot useful overall
and could provide more context.

> And I agree with half-of what Karthik said, i.e., we do not want to
> die here if this is meant to run as a part of "git fsck".
> 
> I may have said this before, but quite frankly, the API into the
> fsck_report_ref() function is misdesigned.  If the single constant
> string "cannot read ref file" cnanot give more information than
> FSCK_MSG_BAD_REF_CONTENT, the parameter has no value.

True in the current form, yeah. If `fsck_report_ref()` learned to take a
vararg argument and treat its first argument as a string format it would
be justified though, as the message is now dynamic and can contain more
context around the specific failure that cannot be provided statically
via the 1:1 mapping between error code and message.

> The fsck.c:report() function, which is the main function to report
> fsck's findings before fsck_report_ref() was introduced, did not
> have such a problem, as it allowed "const char *fmt, ..." at the
> end.  Is it too late to fix the fsck_report_ref()?

I don't think so, I think we should be able to refactor the code rather
easily to do so.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-08 17:44                 ` Junio C Hamano
  2024-10-09  8:05                   ` Patrick Steinhardt
@ 2024-10-09 11:55                   ` shejialuo
  1 sibling, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-09 11:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git, Patrick Steinhardt

On Tue, Oct 08, 2024 at 10:44:53AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > On Tue, Oct 08, 2024 at 12:43:20AM -0700, Karthik Nayak wrote:
> >> shejialuo <shejialuo@gmail.com> writes:
> >> 
> >> [snip]
> >> 
> >> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> >> > +		ret = fsck_report_ref(o, &report,
> >> > +				      FSCK_MSG_BAD_REF_CONTENT,
> >> > +				      "cannot read ref file");
> >> > +		goto cleanup;
> >> > +	}
> >> > +
> >> 
> >> Shouldn't we use `die_errno` here instead? I mean, this is not really a
> >> bad ref content issue. If we don't want to die here, it would still
> >> probably be nice to get the actual issue using `strerror` instead and
> >> use that instead of the generic message we have here.
> >> 
> >
> > Well, I think I need to dive into the "open" system call here. Actually,
> > we have two opinions now. Junio thought that we should use
> > "fsck_report_ref" to report. Karthik, Patrick and I thought that we
> > should report using "*errno" because this is a general error.
> 
> What do you mean by "a general error"?  It is true that we failed to
> read a ref file, so even if it is an I/O error, I'd think it is OK
> to report it as an error while reading one particular ref.

Make sense.

> Giving more information is a separate issue.  If fsck_report_ref()
> can be extended to take something like
> 
>     "cannot read ref file '%s': (%s)", iter->path.buf, strerror(errno)
> 
> that would give the user necessary information.

At current, the `fsck_report_ref` can do this. I think I used
`fsck_report_ref` function badly in this case.

> And I agree with half-of what Karthik said, i.e., we do not want to
> die here if this is meant to run as a part of "git fsck".

Yes, we should not die the program. Instead, we need to continuously
check other refs.

> I may have said this before, but quite frankly, the API into the
> fsck_report_ref() function is misdesigned.  If the single constant
> string "cannot read ref file" cnanot give more information than
> FSCK_MSG_BAD_REF_CONTENT, the parameter has no value.
> 
> The fsck.c:report() function, which is the main function to report
> fsck's findings before fsck_report_ref() was introduced, did not
> have such a problem, as it allowed "const char *fmt, ..." at the
> end.  Is it too late to fix the fsck_report_ref()?

I agree that if the FSCK message id could explain the error well, there
is no need for us to provide extra message. But, I want to say the
`fsck_report_ref` is not misdesigned here. It is just the same as the
"fsck.c::report" function which has "const char *fmt, ..." at the end
like the following shows:

    int fsck_report_ref(struct fsck_options *options,
                        struct fsck_ref_report *report,
                        enum fsck_msg_id msg_id,
                        const char *fmt, ...)

And I do think "fsck.c::report" function also has the above problems.
Let me give you some examples here in "fsck.c":

    report(options, tree_oid, OBJ_TREE,
           FSCK_MSG_BAD_FILEMODE,
           "contains bad file modes");

    report(options, tree_oid, OBJ_TREE,
           FSCK_MSG_DUPLICATE_ENTRIES,
           "contains duplicate file entries");

    ...

So, I want to say there is no difference between "fsck_ref_report" and
"fsck.c::report". When I refactored the code in GSoC journey, the main
problem is that we should reuse the original "fsck.c::report" code
instead of writing redundant codes.

The final result is I extract a new function "fsck_vreport" here (I
leverage the original "fsck.c::report" function) which will be called by
"fsck_ref_report" and "fsck.c::report".

    static int fsck_vreport(struct fsck_options *options,
                            void *fsck_report,
                            enum fsck_msg_id msg_id,
                            const char *fmt, va_list ap)

From my perspective, if we decide to refactor, we should allow the user
call the followings:

    fsck_ref_report(..., FSCK_MSG_BAD_REF_CONTENT, NULL);
    report(..., FSCK_MSG_DUPLICATE_ENTRIES, NULL);

So, we should check whether `fmt` is NULL in the `fsck_vreport`
function to make sure that if FSCK message is good enough to explain
what happens, we should not pass any message.

> Thanks.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-09  8:05                   ` Patrick Steinhardt
@ 2024-10-09 11:59                     ` shejialuo
  2024-10-10  6:52                       ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-09 11:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, Karthik Nayak, git

On Wed, Oct 09, 2024 at 10:05:19AM +0200, Patrick Steinhardt wrote:

[snip]

> > I may have said this before, but quite frankly, the API into the
> > fsck_report_ref() function is misdesigned.  If the single constant
> > string "cannot read ref file" cnanot give more information than
> > FSCK_MSG_BAD_REF_CONTENT, the parameter has no value.
> 
> True in the current form, yeah. If `fsck_report_ref()` learned to take a
> vararg argument and treat its first argument as a string format it would
> be justified though, as the message is now dynamic and can contain more
> context around the specific failure that cannot be provided statically
> via the 1:1 mapping between error code and message.
> 

It is not "learned". At current, `fsck_report_ref` can do this and is
the same as "fsck.c::report". I have explained this when replying to
Junio.

> > The fsck.c:report() function, which is the main function to report
> > fsck's findings before fsck_report_ref() was introduced, did not
> > have such a problem, as it allowed "const char *fmt, ..." at the
> > end.  Is it too late to fix the fsck_report_ref()?
> 
> I don't think so, I think we should be able to refactor the code rather
> easily to do so.
> 

It's not hard to refactor the code. But this is not the problem. I am a
little confused here. Because we already allowed "fsck_report_ref"
having "const char *fmt, ..." at the end.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-09 11:59                     ` shejialuo
@ 2024-10-10  6:52                       ` Patrick Steinhardt
  2024-10-10 16:00                         ` Junio C Hamano
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-10-10  6:52 UTC (permalink / raw)
  To: shejialuo; +Cc: Junio C Hamano, Karthik Nayak, git

On Wed, Oct 09, 2024 at 07:59:20PM +0800, shejialuo wrote:
> On Wed, Oct 09, 2024 at 10:05:19AM +0200, Patrick Steinhardt wrote:
> > > The fsck.c:report() function, which is the main function to report
> > > fsck's findings before fsck_report_ref() was introduced, did not
> > > have such a problem, as it allowed "const char *fmt, ..." at the
> > > end.  Is it too late to fix the fsck_report_ref()?
> > 
> > I don't think so, I think we should be able to refactor the code rather
> > easily to do so.
> > 
> 
> It's not hard to refactor the code. But this is not the problem. I am a
> little confused here. Because we already allowed "fsck_report_ref"
> having "const char *fmt, ..." at the end.

Ah, I didn't double check, but was operating on what I understood from
this thread. In that case I think that the current interface is okay.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-10  6:52                       ` Patrick Steinhardt
@ 2024-10-10 16:00                         ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-10-10 16:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, Karthik Nayak, git

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Oct 09, 2024 at 07:59:20PM +0800, shejialuo wrote:
>> On Wed, Oct 09, 2024 at 10:05:19AM +0200, Patrick Steinhardt wrote:
>> > > The fsck.c:report() function, which is the main function to report
>> > > fsck's findings before fsck_report_ref() was introduced, did not
>> > > have such a problem, as it allowed "const char *fmt, ..." at the
>> > > end.  Is it too late to fix the fsck_report_ref()?
>> > 
>> > I don't think so, I think we should be able to refactor the code rather
>> > easily to do so.
>> > 
>> 
>> It's not hard to refactor the code. But this is not the problem. I am a
>> little confused here. Because we already allowed "fsck_report_ref"
>> having "const char *fmt, ..." at the end.
>
> Ah, I didn't double check, but was operating on what I understood from
> this thread. In that case I think that the current interface is okay.

I didn't, either.  So there is an obvious way out for "why aren't we
telling the errno to users" issue?  That's good.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v6 0/9] add ref content check for files backend
  2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
                             ` (10 preceding siblings ...)
  2024-10-07 12:49           ` shejialuo
@ 2024-10-21 13:32           ` shejialuo
  2024-10-21 13:34             ` [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
                               ` (11 more replies)
  11 siblings, 12 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:32 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This new version updates the following things.

First, I want to talk about the new things. [PATCH v6 2/9] and [PATCH v6
3/9] are used to solve a bug when I implemented the checks for refname
for the following code:

    if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
          ret = fsck_report(...);
    }

So, the code will wrongly report an error for "refs/heads/@". And I fix
this issue by using two commits.

For the difference against the previous version

1. Split [PATCH v5 8/9] into every related commit.

2. In [PATCH v6 4/9], print the worktree ref fullname to avoid
ambiguous.

3. Use one-to-one mapping fsck message.

4. Enhance the commit message and the usage of "fsck_report_ref" to
provide more useful information.

5. Rename "escapeReferent" to "symrefTargetIsNotARef". I agree that we
should use this. "escpae" is not right. However, I cannot find an
elegant name. So I follow the advice from Patrick.

I provide the "interdiff" here which will be helpful for reviewers.

Thanks,
Jialuo

shejialuo (9):
  ref: initialize "fsck_ref_report" with zero
  ref: check the full refname instead of basename
  ref: initialize target name outside of check functions
  ref: support multiple worktrees check for refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add basic symref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  35 +++
 builtin/refs.c                |  12 +-
 fsck.h                        |   6 +
 refs.c                        |   7 +-
 refs.h                        |   3 +-
 refs/debug.c                  |   5 +-
 refs/files-backend.c          | 187 ++++++++++++--
 refs/packed-backend.c         |   8 +-
 refs/refs-internal.h          |   5 +-
 refs/reftable-backend.c       |   3 +-
 t/t0602-reffiles-fsck.sh      | 457 +++++++++++++++++++++++++++++++++-
 11 files changed, 693 insertions(+), 35 deletions(-)

Interdiff against v5:
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index ffe9d6a2f6..b14bc44ca4 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,8 +28,8 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
-`badReferent`::
-	(ERROR) The referent of a ref is invalid.
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
 
 `badTagName`::
 	(INFO) A tag has an invalid format.
@@ -52,14 +52,6 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
-`escapeReferent`::
-	(INFO) The referent of a symref is outside the "ref" directory.
-	Although we allow create a symref pointing to the referent which
-	is outside the "ref" by using `git symbolic-ref`, we may tighten
-	the rule in the future. Report to the git@vger.kernel.org
-	mailing list if you see this error, as we need to know what tools
-	created such a file.
-
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
@@ -184,11 +176,34 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A loose ref that does not end with newline(LF). As
+	valid implementations of Git never created such a loose ref
+	file, it may become an error in the future. Report to the
+	git@vger.kernel.org mailing list if you see this error, as
+	we need to know what tools created such a file.
+
 `symlinkRef`::
-	(INFO) A symbolic link is used as a symref.  Report to the
+	(INFO) A symbolic link is used as a symref. Report to the
 	git@vger.kernel.org mailing list if you see this error, as we
 	are assessing the feasibility of dropping the support to drop
-	creating symblinks as symrefs.
+	creating symbolic links as symrefs.
+
+`symrefTargetIsNotARef`::
+	(INFO) The target of a symbolic reference points neither to
+	a root reference nor to a reference starting with "refs/".
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
+`trailingRefContent`::
+	(INFO) A loose ref has trailing content. As valid implementations
+	of Git never created such a loose ref file, it may become an
+	error in the future. Report to the git@vger.kernel.org mailing
+	list if you see this error, as we need to know what tools
+	created such a file.
 
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
@@ -196,14 +211,6 @@
 `unknownType`::
 	(ERROR) Found an unknown object type.
 
-`unofficialFormattedRef`::
-	(INFO) The content of a loose ref file is not in the official
-	format such as not having a LF at the end or having trailing
-	garbage. As valid implementations of Git never created such a
-	loose ref file, it may become an error in the future. Report
-	to the git@vger.kernel.org mailing list if you see this error,
-	as we need to know what tools created such a file.
-
 `unterminatedHeader`::
 	(FATAL) Missing end-of-line in the object header.
 
diff --git a/builtin/refs.c b/builtin/refs.c
index 3c492ea922..886c4ceae3 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -89,9 +89,10 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	worktrees = get_worktrees();
 	for (p = worktrees; *p; p++) {
 		struct worktree *wt = *p;
-		ret += refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options);
+		ret |= refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options, wt);
 	}
 
+
 	fsck_options_clear(&fsck_refs_options);
 	free_worktrees(worktrees);
 	return ret;
diff --git a/fsck.h b/fsck.h
index f1da5c8a77..a44c231a5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,7 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
-	FUNC(BAD_REFERENT, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
@@ -80,7 +80,6 @@ enum fsck_msg_type {
 	FUNC(LARGE_PATHNAME, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(BAD_FILEMODE, INFO) \
-	FUNC(ESCAPE_REFERENT, INFO) \
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
 	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
@@ -88,7 +87,9 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(SYMLINK_REF, INFO) \
-	FUNC(UNOFFICIAL_FORMATTED_REF, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 6ba1bb1aa1..f88b32a633 100644
--- a/refs.c
+++ b/refs.c
@@ -318,9 +318,10 @@ int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
-int refs_fsck(struct ref_store *refs, struct fsck_options *o)
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt)
 {
-	return refs->be->fsck(refs, o);
+	return refs->be->fsck(refs, o, wt);
 }
 
 void sanitize_refname_component(const char *refname, struct strbuf *out)
diff --git a/refs.h b/refs.h
index 108dfc93b3..341d43239c 100644
--- a/refs.h
+++ b/refs.h
@@ -549,7 +549,8 @@ int check_refname_format(const char *refname, int flags);
  * reflogs are consistent, and non-zero otherwise. The errors will be
  * written to stderr.
  */
-int refs_fsck(struct ref_store *refs, struct fsck_options *o);
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt);
 
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
diff --git a/refs/debug.c b/refs/debug.c
index 45e2e784a0..72e80ddd6d 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -420,10 +420,11 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 static int debug_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->fsck(drefs->refs, o);
+	int res = drefs->refs->be->fsck(drefs->refs, o, wt);
 	trace_printf_key(&trace_refs, "fsck: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5a5327a146..180f8e28b7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -24,6 +24,7 @@
 #include "../dir.h"
 #include "../chdir-notify.h"
 #include "../setup.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../revision.h"
@@ -3506,7 +3507,7 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store,
  */
 typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  struct fsck_options *o,
-				  const char *refs_check_dir,
+				  const char *target_name,
 				  struct dir_iterator *iter);
 
 static int files_fsck_symref_target(struct fsck_options *o,
@@ -3514,27 +3515,29 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    struct strbuf *referent,
 				    unsigned int symbolic_link)
 {
+	int is_referent_root;
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
 
-	if (!symbolic_link) {
-		orig_len = referent->len;
-		orig_last_byte = referent->buf[orig_len - 1];
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	if (!symbolic_link)
 		strbuf_rtrim(referent);
-	}
 
-	if (!starts_with(referent->buf, "refs/") &&
+	is_referent_root = is_root_ref(referent->buf);
+	if (!is_referent_root &&
+	    !starts_with(referent->buf, "refs/") &&
 	    !starts_with(referent->buf, "worktrees/")) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_ESCAPE_REFERENT,
-				      "referent '%s' is outside of refs/ or worktrees/",
-				      referent->buf);
+				      FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF,
+				      "points to non-ref target '%s'", referent->buf);
+
 	}
 
-	if (check_refname_format(referent->buf, 0)) {
+	if (!is_referent_root && check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_REFERENT,
+				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to invalid refname '%s'", referent->buf);
 		goto out;
 	}
@@ -3542,17 +3545,16 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	if (symbolic_link)
 		goto out;
 
-
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
 				      "misses LF at the end");
 	}
 
 	if (referent->len != orig_len && referent->len != orig_len - 1) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
 				      "has trailing whitespaces or newlines");
 	}
 
@@ -3562,13 +3564,12 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
-				   const char *refs_check_dir,
+				   const char *target_name,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
 	const char *trailing = NULL;
 	unsigned int type = 0;
@@ -3576,8 +3577,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct object_id oid;
 	int ret = 0;
 
-	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
-	report.path = refname.buf;
+	report.path = target_name;
 
 	if (S_ISLNK(iter->st.st_mode)) {
 		const char* relative_referent_path = NULL;
@@ -3600,14 +3600,15 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		else
 			strbuf_addbuf(&referent, &ref_content);
 
-		ret += files_fsck_symref_target(o, &report, &referent, 1);
+		ret |= files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
 	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
-				      "cannot read ref file");
+				      "cannot read ref file '%s': (%s)",
+				      iter->path.buf, strerror(errno));
 		goto cleanup;
 	}
 
@@ -3624,13 +3625,13 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	if (!(type & REF_ISSYMREF)) {
 		if (!*trailing) {
 			ret = fsck_report_ref(o, &report,
-					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
 					      "misses LF at the end");
 			goto cleanup;
 		}
 		if (*trailing != '\n' || *(trailing + 1)) {
 			ret = fsck_report_ref(o, &report,
-					      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
@@ -3640,7 +3641,6 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	}
 
 cleanup:
-	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&abs_gitdir);
@@ -3649,7 +3649,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
-				const char *refs_check_dir,
+				const char *target_name,
 				struct dir_iterator *iter)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -3662,11 +3662,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+	if (check_refname_format(target_name, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
-		report.path = sb.buf;
+		report.path = target_name;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
 				      "invalid refname format");
@@ -3680,8 +3679,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
+			       struct worktree *wt,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
+	struct strbuf target_name = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	struct dir_iterator *iter;
 	int iter_status;
@@ -3700,11 +3701,18 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			continue;
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
+			strbuf_reset(&target_name);
+
+			if (!is_main_worktree(wt))
+				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
+			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
+				    iter->relative_path);
+
 			if (o->verbose)
-				fprintf_ln(stdout, "Checking %s/%s",
-					   refs_check_dir, iter->relative_path);
+				fprintf_ln(stderr, "Checking %s", target_name.buf);
+
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
-				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
+				if (fsck_refs_fn[i](ref_store, o, target_name.buf, iter))
 					ret = -1;
 			}
 		} else {
@@ -3721,11 +3729,13 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 
 out:
 	strbuf_release(&sb);
+	strbuf_release(&target_name);
 	return ret;
 }
 
 static int files_fsck_refs(struct ref_store *ref_store,
-			   struct fsck_options *o)
+			   struct fsck_options *o,
+			   struct worktree *wt)
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
@@ -3733,27 +3743,20 @@ static int files_fsck_refs(struct ref_store *ref_store,
 		NULL,
 	};
 
-	fprintf_ln(stdout, _("Checking references consistency in %s"),
-		   ref_store->gitdir);
-	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
+	if (o->verbose)
+		fprintf_ln(stderr, _("Checking references consistency"));
+	return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
 }
 
 static int files_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	int ret = files_fsck_refs(ref_store, o);
-
-	/*
-	 * packed-refs should only be checked once because it is shared
-	 * between all worktrees.
-	 */
-	if (!strcmp(ref_store->gitdir, ref_store->repo->gitdir))
-		ret += refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
-
-	return ret;
+	return files_fsck_refs(ref_store, o, wt) |
+	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt);
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 07c57fd541..46dcaec654 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -13,6 +13,7 @@
 #include "../lockfile.h"
 #include "../chdir-notify.h"
 #include "../statinfo.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../trace2.h"
@@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 static int packed_fsck(struct ref_store *ref_store UNUSED,
-		       struct fsck_options *o UNUSED)
+		       struct fsck_options *o UNUSED,
+		       struct worktree *wt)
 {
+
+	if (!is_main_worktree(wt))
+		return 0;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 73b05f971b..125f1fe735 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -653,7 +653,8 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 typedef int fsck_fn(struct ref_store *ref_store,
-		    struct fsck_options *o);
+		    struct fsck_options *o,
+		    struct worktree *wt);
 
 struct ref_storage_be {
 	const char *name;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index f5f957e6de..b6a63c1015 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2443,7 +2443,8 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 }
 
 static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED)
+			    struct fsck_options *o UNUSED,
+			    struct worktree *wt UNUSED)
 {
 	return 0;
 }
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index be4c064b3c..aee7e04b82 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
 	git tag tag-2 &&
 	git tag multi_hierarchy/tag-2 &&
 
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	EOF
+	test_must_be_empty err &&
+	rm $branch_dir_prefix/@ &&
+
 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
@@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/@: badRefName: invalid refname format
+	error: refs/heads/ branch-1: badRefName: invalid refname format
 	EOF
-	rm $branch_dir_prefix/@ &&
+	rm $branch_dir_prefix/'\'' branch-1'\'' &&
 	test_cmp expect err &&
 
-	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
+	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
+	error: refs/tags/multi_hierarchy/~tag-2: badRefName: invalid refname format
 	EOF
-	rm $tag_dir_prefix/multi_hierarchy/@ &&
+	rm $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
 	test_cmp expect err &&
 
 	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
@@ -60,6 +67,15 @@ test_expect_success 'ref name should be checked' '
 	error: refs/tags/.lock: badRefName: invalid refname format
 	EOF
 	rm $tag_dir_prefix/.lock &&
+	test_cmp expect err &&
+
+	mkdir $tag_dir_prefix/'\''~new-feature'\'' &&
+	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/'\''~new-feature'\''/tag-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/~new-feature/tag-1: badRefName: invalid refname format
+	EOF
+	rm -rf $tag_dir_prefix/'\''~new-feature'\'' &&
 	test_cmp expect err
 '
 
@@ -84,7 +100,7 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\''~branch-1'\'' &&
 	git -c fsck.badRefName=ignore refs verify 2>err &&
 	test_must_be_empty err
 '
@@ -114,13 +130,13 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 		git update-ref refs/worktree/branch-4 refs/heads/branch-3
 	) &&
 
-	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/.branch-2 &&
-	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/@ &&
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/worktree/.branch-2: badRefName: invalid refname format
-	error: refs/worktree/@: badRefName: invalid refname format
+	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err &&
@@ -129,8 +145,8 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 		cd worktree-1 &&
 		test_must_fail git refs verify 2>err &&
 		cat >expect <<-EOF &&
-		error: refs/worktree/.branch-2: badRefName: invalid refname format
-		error: refs/worktree/@: badRefName: invalid refname format
+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
 		EOF
 		sort err >sorted_err &&
 		test_cmp expect sorted_err
@@ -140,8 +156,8 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 		cd worktree-2 &&
 		test_must_fail git refs verify 2>err &&
 		cat >expect <<-EOF &&
-		error: refs/worktree/.branch-2: badRefName: invalid refname format
-		error: refs/worktree/@: badRefName: invalid refname format
+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
 		EOF
 		sort err >sorted_err &&
 		test_cmp expect sorted_err
@@ -190,7 +206,7 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $branch_dir_prefix/branch-no-newline &&
 	test_cmp expect err &&
@@ -198,7 +214,7 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
 	EOF
 	rm $branch_dir_prefix/branch-garbage &&
 	test_cmp expect err &&
@@ -206,7 +222,7 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/tags/tag-garbage-1: unofficialFormattedRef: has trailing garbage: '\''
+	warning: refs/tags/tag-garbage-1: trailingRefContent: has trailing garbage: '\''
 
 
 	'\''
@@ -217,7 +233,7 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/tags/tag-garbage-2: unofficialFormattedRef: has trailing garbage: '\''
+	warning: refs/tags/tag-garbage-2: trailingRefContent: has trailing garbage: '\''
 
 
 	  garbage'\''
@@ -228,16 +244,16 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	printf "%s    garbage\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/tags/tag-garbage-3: unofficialFormattedRef: has trailing garbage: '\''    garbage
+	warning: refs/tags/tag-garbage-3: trailingRefContent: has trailing garbage: '\''    garbage
 	a'\''
 	EOF
 	rm $tag_dir_prefix/tag-garbage-3 &&
 	test_cmp expect err &&
 
 	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
-	test_must_fail git -c fsck.unofficialFormattedRef=error refs verify 2>err &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/tags/tag-garbage-4: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
+	error: refs/tags/tag-garbage-4: trailingRefContent: has trailing garbage: '\'' garbage'\''
 	EOF
 	rm $tag_dir_prefix/tag-garbage-4 &&
 	test_cmp expect err
@@ -266,8 +282,8 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
-	warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
-	warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
@@ -287,10 +303,15 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	rm $branch_dir_prefix/branch-good &&
 	test_must_be_empty err &&
 
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-head &&
+	test_must_be_empty err &&
+
 	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
 	EOF
 	rm $branch_dir_prefix/branch-no-newline-1 &&
 	test_cmp expect err &&
@@ -298,8 +319,8 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
-	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
 	EOF
 	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
 	test_cmp expect err &&
@@ -307,7 +328,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
 	EOF
 	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
 	test_cmp expect err &&
@@ -315,7 +336,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
 	EOF
 	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
 	test_cmp expect err &&
@@ -323,8 +344,8 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
-	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
 	EOF
 	rm $branch_dir_prefix/a/b/branch-complicated &&
 	test_cmp expect err &&
@@ -332,7 +353,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
 	EOF
 	rm $branch_dir_prefix/branch-bad-1 &&
 	test_cmp expect err
@@ -348,6 +369,7 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	mkdir -p "$branch_dir_prefix/a/b" &&
 
 	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
 	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
 	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
 	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
@@ -357,20 +379,20 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
-	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
-	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
-	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
-	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
-	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
-	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
-	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
 '
 
-test_expect_success 'textual symref should be checked whether it is escaped' '
+test_expect_success 'the target of the textual symref should be checked' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	branch_dir_prefix=.git/refs/heads &&
@@ -379,48 +401,71 @@ test_expect_success 'textual symref should be checked whether it is escaped' '
 	test_commit default &&
 	mkdir -p "$branch_dir_prefix/a/b" &&
 
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/foo\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
 	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/ or worktrees/
+	warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''refs-back/heads/main'\''
 	EOF
 	rm $branch_dir_prefix/branch-bad-1 &&
 	test_cmp expect err
 '
 
-test_expect_success 'textual symref escape check should work with worktrees' '
+test_expect_success SYMLINKS 'symlink symref content should be checked' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
 	cd repo &&
 	test_commit default &&
-	git branch branch-1 &&
-	git branch branch-2 &&
-	git branch branch-3 &&
-	git worktree add ./worktree-1 branch-2 &&
-	git worktree add ./worktree-2 branch-3 &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
 
-	(
-		cd worktree-1 &&
-		git branch refs/worktree/w1-branch &&
-		git symbolic-ref refs/worktree/branch-4 refs/heads/branch-1 &&
-		git symbolic-ref refs/worktree/branch-5 worktrees/worktree-2/refs/worktree/w2-branch
-	) &&
-	(
-		cd worktree-2 &&
-		git branch refs/worktree/w2-branch &&
-		git symbolic-ref refs/worktree/branch-4 refs/heads/branch-1 &&
-		git symbolic-ref refs/worktree/branch-5 worktrees/worktree-1/refs/worktree/w1-branch
-	) &&
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
 
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
 
-	git symbolic-ref refs/heads/branch-5 worktrees/worktree-1/refs/worktree/w1-branch &&
-	git symbolic-ref refs/heads/branch-6 worktrees/worktree-2/refs/worktree/w2-branch &&
+	ln -sf ./"branch   " $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferentName: points to invalid refname '\''refs/heads/branch   '\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
 
-	git refs verify 2>err &&
-	test_must_be_empty err
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
 '
 
-test_expect_success 'all textual symref checks should work with worktrees' '
+test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	cd repo &&
@@ -449,7 +494,7 @@ test_expect_success 'all textual symref checks should work with worktrees' '
 	printf "%s" $bad_content_1 >$worktree1_refdir_prefix/bad-branch-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/worktree/bad-branch-1: badRefContent: $bad_content_1
+	error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content_1
 	EOF
 	rm $worktree1_refdir_prefix/bad-branch-1 &&
 	test_cmp expect err &&
@@ -457,7 +502,7 @@ test_expect_success 'all textual symref checks should work with worktrees' '
 	printf "%s" $bad_content_2 >$worktree2_refdir_prefix/bad-branch-2 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/worktree/bad-branch-2: badRefContent: $bad_content_2
+	error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content_2
 	EOF
 	rm $worktree2_refdir_prefix/bad-branch-2 &&
 	test_cmp expect err &&
@@ -465,7 +510,7 @@ test_expect_success 'all textual symref checks should work with worktrees' '
 	printf "%s" $bad_content_3 >$worktree1_refdir_prefix/bad-branch-3 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/worktree/bad-branch-3: badRefContent: $bad_content_3
+	error: worktrees/worktree-1/refs/worktree/bad-branch-3: badRefContent: $bad_content_3
 	EOF
 	rm $worktree1_refdir_prefix/bad-branch-3 &&
 	test_cmp expect err &&
@@ -473,61 +518,17 @@ test_expect_success 'all textual symref checks should work with worktrees' '
 	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/worktree/branch-no-newline: unofficialFormattedRef: misses LF at the end
+	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $worktree1_refdir_prefix/branch-no-newline &&
 	test_cmp expect err &&
 
-	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree2_refdir_prefix/branch-garbage &&
-	git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	warning: refs/worktree/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\''
-	EOF
-	rm $worktree2_refdir_prefix/branch-garbage
-'
-
-test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
-	test_when_finished "rm -rf repo" &&
-	git init repo &&
-	branch_dir_prefix=.git/refs/heads &&
-	tag_dir_prefix=.git/refs/tags &&
-	cd repo &&
-	test_commit default &&
-	mkdir -p "$branch_dir_prefix/a/b" &&
-
-	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
-	git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
-	EOF
-	rm $branch_dir_prefix/branch-symbolic-good &&
-	test_cmp expect err &&
-
-	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-garbage &&
 	git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
-	warning: refs/heads/branch-symbolic: escapeReferent: referent '\''logs/branch-escape'\'' is outside of refs/ or worktrees/
+	warning: worktrees/worktree-1/refs/worktree/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
 	EOF
-	rm $branch_dir_prefix/branch-symbolic &&
-	test_cmp expect err &&
-
-	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-bad &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-bad: badReferent: points to invalid refname '\''refs/heads/branch   space'\''
-	EOF
-	rm $branch_dir_prefix/branch-symbolic-bad &&
-	test_cmp expect err &&
-
-	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
-	error: refs/tags/tag-symbolic-1: badReferent: points to invalid refname '\''refs/tags/.tag'\''
-	EOF
-	rm $tag_dir_prefix/tag-symbolic-1 &&
+	rm $worktree1_refdir_prefix/branch-garbage &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
                               ` (10 subsequent siblings)
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0824c0b8a9..03d2503276 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,7 +3520,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 2/9] ref: check the full refname instead of basename
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
  2024-10-21 13:34             ` [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-10-21 15:38               ` karthik nayak
  2024-11-05  7:11               ` Patrick Steinhardt
  2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
                               ` (9 subsequent siblings)
  11 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "files-backend.c::files_fsck_refs_name", we validate the refname
format by using "check_refname_format" to check the basename of the
iterator with "REFNAME_ALLOW_ONELEVEL" flag.

However, this is a bad implementation. Although we doesn't allow a
single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
report an error wrongly when there is a "refs/heads/@" ref by using one
level refname "@".

Because we just check one level refname, we either cannot check the
other parts of the full refname. And we will ignore the following
errors:

  "refs/heads/ new-feature/test"
  "refs/heads/~new-feature/test"

In order to fix the above problem, enhance "files_fsck_refs_name" to use
the full name for "check_refname_format". Then, replace the tests which
are related to "@" and add tests to exercise the above situations.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     |  4 ++--
 t/t0602-reffiles-fsck.sh | 30 +++++++++++++++++++++++-------
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 03d2503276..f246c92684 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3519,10 +3519,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
+	if (check_refname_format(sb.buf, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..0aee377439 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
 	git tag tag-2 &&
 	git tag multi_hierarchy/tag-2 &&
 
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	EOF
+	test_must_be_empty err &&
+	rm $branch_dir_prefix/@ &&
+
 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
@@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/@: badRefName: invalid refname format
+	error: refs/heads/ branch-1: badRefName: invalid refname format
 	EOF
-	rm $branch_dir_prefix/@ &&
+	rm $branch_dir_prefix/'\'' branch-1'\'' &&
 	test_cmp expect err &&
 
-	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
+	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
+	error: refs/tags/multi_hierarchy/~tag-2: badRefName: invalid refname format
 	EOF
-	rm $tag_dir_prefix/multi_hierarchy/@ &&
+	rm $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
 	test_cmp expect err &&
 
 	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
@@ -60,6 +67,15 @@ test_expect_success 'ref name should be checked' '
 	error: refs/tags/.lock: badRefName: invalid refname format
 	EOF
 	rm $tag_dir_prefix/.lock &&
+	test_cmp expect err &&
+
+	mkdir $tag_dir_prefix/'\''~new-feature'\'' &&
+	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/'\''~new-feature'\''/tag-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/~new-feature/tag-1: badRefName: invalid refname format
+	EOF
+	rm -rf $tag_dir_prefix/'\''~new-feature'\'' &&
 	test_cmp expect err
 '
 
@@ -84,7 +100,7 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\''~branch-1'\'' &&
 	git -c fsck.badRefName=ignore refs verify 2>err &&
 	test_must_be_empty err
 '
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 3/9] ref: initialize target name outside of check functions
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
  2024-10-21 13:34             ` [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-10-21 15:49               ` karthik nayak
  2024-11-05  7:11               ` Patrick Steinhardt
  2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
                               ` (8 subsequent siblings)
  11 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We passes "refs_check_dir" to the "files_fsck_refs_name" function which
allows it to create the checked ref name later. However, when we
introduce a new check function, we have to re-calculate the target name.
It's bad for us to do repeat calculation. Instead, we should calculate
it only once and pass the target name to the check functions.

In order not to do repeat calculation, rename "refs_check_dir" to
"target_name". And in "files_fsck_refs_dir", create a new strbuf
"target_name", thus whenever we handle a new target, calculate the
name and call the check functions one by one.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f246c92684..fbfcd1115c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3501,12 +3501,12 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store,
  */
 typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  struct fsck_options *o,
-				  const char *refs_check_dir,
+				  const char *target_name,
 				  struct dir_iterator *iter);
 
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
-				const char *refs_check_dir,
+				const char *target_name,
 				struct dir_iterator *iter)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -3519,11 +3519,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
-	if (check_refname_format(sb.buf, 0)) {
+	if (check_refname_format(target_name, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		report.path = sb.buf;
+		report.path = target_name;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
 				      "invalid refname format");
@@ -3539,6 +3538,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       const char *refs_check_dir,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
+	struct strbuf target_name = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	struct dir_iterator *iter;
 	int iter_status;
@@ -3557,11 +3557,15 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			continue;
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
+			strbuf_reset(&target_name);
+			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
+				    iter->relative_path);
+
 			if (o->verbose)
-				fprintf_ln(stderr, "Checking %s/%s",
-					   refs_check_dir, iter->relative_path);
+				fprintf_ln(stderr, "Checking %s", target_name.buf);
+
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
-				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
+				if (fsck_refs_fn[i](ref_store, o, target_name.buf, iter))
 					ret = -1;
 			}
 		} else {
@@ -3578,6 +3582,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 
 out:
 	strbuf_release(&sb);
+	strbuf_release(&target_name);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (2 preceding siblings ...)
  2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-10-21 15:56               ` karthik nayak
  2024-11-05  7:11               ` Patrick Steinhardt
  2024-10-21 13:34             ` [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
                               ` (7 subsequent siblings)
  11 siblings, 2 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. As we decide to add more
checks for ref content, we need to set up support for multiple
worktrees.

Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.

The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.

Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.

Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c           | 12 ++++++--
 refs.c                   |  5 ++--
 refs.h                   |  3 +-
 refs/debug.c             |  5 ++--
 refs/files-backend.c     | 17 ++++++++----
 refs/packed-backend.c    |  8 +++++-
 refs/refs-internal.h     |  3 +-
 refs/reftable-backend.c  |  3 +-
 t/t0602-reffiles-fsck.sh | 59 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/builtin/refs.c b/builtin/refs.c
index 24978a7b7b..886c4ceae3 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "refs.h"
 #include "strbuf.h"
+#include "worktree.h"
 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
@@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 {
 	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	struct worktree **worktrees, **p;
 	const char * const verify_usage[] = {
 		REFS_VERIFY_USAGE,
 		NULL,
@@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
 		OPT_END(),
 	};
-	int ret;
+	int ret = 0;
 
 	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
 	if (argc)
@@ -84,9 +86,15 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+	worktrees = get_worktrees();
+	for (p = worktrees; *p; p++) {
+		struct worktree *wt = *p;
+		ret |= refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options, wt);
+	}
+
 
 	fsck_options_clear(&fsck_refs_options);
+	free_worktrees(worktrees);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 5f729ed412..395a17273c 100644
--- a/refs.c
+++ b/refs.c
@@ -318,9 +318,10 @@ int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
-int refs_fsck(struct ref_store *refs, struct fsck_options *o)
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt)
 {
-	return refs->be->fsck(refs, o);
+	return refs->be->fsck(refs, o, wt);
 }
 
 void sanitize_refname_component(const char *refname, struct strbuf *out)
diff --git a/refs.h b/refs.h
index 108dfc93b3..341d43239c 100644
--- a/refs.h
+++ b/refs.h
@@ -549,7 +549,8 @@ int check_refname_format(const char *refname, int flags);
  * reflogs are consistent, and non-zero otherwise. The errors will be
  * written to stderr.
  */
-int refs_fsck(struct ref_store *refs, struct fsck_options *o);
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt);
 
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
diff --git a/refs/debug.c b/refs/debug.c
index 45e2e784a0..72e80ddd6d 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -420,10 +420,11 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 static int debug_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->fsck(drefs->refs, o);
+	int res = drefs->refs->be->fsck(drefs->refs, o, wt);
 	trace_printf_key(&trace_refs, "fsck: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fbfcd1115c..24ad73faba 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -23,6 +23,7 @@
 #include "../dir.h"
 #include "../chdir-notify.h"
 #include "../setup.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../revision.h"
@@ -3536,6 +3537,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
+			       struct worktree *wt,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
 	struct strbuf target_name = STRBUF_INIT;
@@ -3558,6 +3560,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
 			strbuf_reset(&target_name);
+
+			if (!is_main_worktree(wt))
+				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
 			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
 				    iter->relative_path);
 
@@ -3587,7 +3592,8 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 }
 
 static int files_fsck_refs(struct ref_store *ref_store,
-			   struct fsck_options *o)
+			   struct fsck_options *o,
+			   struct worktree *wt)
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
@@ -3596,17 +3602,18 @@ static int files_fsck_refs(struct ref_store *ref_store,
 
 	if (o->verbose)
 		fprintf_ln(stderr, _("Checking references consistency"));
-	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
+	return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
 }
 
 static int files_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	return files_fsck_refs(ref_store, o) |
-	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+	return files_fsck_refs(ref_store, o, wt) |
+	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt);
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 07c57fd541..46dcaec654 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -13,6 +13,7 @@
 #include "../lockfile.h"
 #include "../chdir-notify.h"
 #include "../statinfo.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../trace2.h"
@@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 static int packed_fsck(struct ref_store *ref_store UNUSED,
-		       struct fsck_options *o UNUSED)
+		       struct fsck_options *o UNUSED,
+		       struct worktree *wt)
 {
+
+	if (!is_main_worktree(wt))
+		return 0;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..037d7991cd 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -653,7 +653,8 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 typedef int fsck_fn(struct ref_store *ref_store,
-		    struct fsck_options *o);
+		    struct fsck_options *o,
+		    struct worktree *wt);
 
 struct ref_storage_be {
 	const char *name;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index f5f957e6de..b6a63c1015 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2443,7 +2443,8 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 }
 
 static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED)
+			    struct fsck_options *o UNUSED,
+			    struct worktree *wt UNUSED)
 {
 	return 0;
 }
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 0aee377439..6eb1385c50 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -105,4 +105,63 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'ref name check should work for multiple worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	cd repo &&
+	test_commit initial &&
+	git checkout -b branch-1 &&
+	test_commit second &&
+	git checkout -b branch-2 &&
+	test_commit third &&
+	git checkout -b branch-3 &&
+	git worktree add ./worktree-1 branch-1 &&
+	git worktree add ./worktree-2 branch-2 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err &&
+
+	(
+		cd worktree-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+		EOF
+		sort err >sorted_err &&
+		test_cmp expect sorted_err
+	) &&
+
+	(
+		cd worktree-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+		EOF
+		sort err >sorted_err &&
+		test_cmp expect sorted_err
+	)
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (3 preceding siblings ...)
  2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-11-05  7:11               ` Patrick Steinhardt
  2024-10-21 13:34             ` [PATCH v6 6/9] ref: add more strict checks for regular refs shejialuo
                               ` (6 subsequent siblings)
  11 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.

  error: refs/heads/main: invalid sha1 pointer 0000...

And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.

As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.

Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  43 +++++++++++++
 t/t0602-reffiles-fsck.sh      | 117 ++++++++++++++++++++++++++++++++++
 4 files changed, 164 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 24ad73faba..2861980bdd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *target_name,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *target_name,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	report.path = target_name;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "cannot read ref file '%s': (%s)",
+				      iter->path.buf, strerror(errno));
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *target_name,
@@ -3597,6 +3639,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 6eb1385c50..29bdd3fc01 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -164,4 +164,121 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 	)
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	bad_content=$(git rev-parse main)x &&
+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	bad_content=xfsazqfxcadas &&
+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	bad_content=Xfsazqfxcadas &&
+	printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
+test_expect_success 'ref content checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	bad_content_1=$(git rev-parse HEAD)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+
+	printf "%s" $bad_content_1 >$worktree1_refdir_prefix/bad-branch-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content_1
+	EOF
+	rm $worktree1_refdir_prefix/bad-branch-1 &&
+	test_cmp expect err &&
+
+	printf "%s" $bad_content_2 >$worktree2_refdir_prefix/bad-branch-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content_2
+	EOF
+	rm $worktree2_refdir_prefix/bad-branch-2 &&
+	test_cmp expect err &&
+
+	printf "%s" $bad_content_3 >$worktree1_refdir_prefix/bad-branch-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/bad-branch-3: badRefContent: $bad_content_3
+	EOF
+	rm $worktree1_refdir_prefix/bad-branch-3 &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 6/9] ref: add more strict checks for regular refs
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (4 preceding siblings ...)
  2024-10-21 13:34             ` [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-10-21 13:34             ` shejialuo
  2024-10-21 13:35             ` [PATCH v6 7/9] ref: add basic symref content check for files backend shejialuo
                               ` (5 subsequent siblings)
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:34 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A loose ref that does not end with
   newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt | 14 ++++++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 26 ++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 67 +++++++++++++++++++++++++++++++++++
 6 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..6db0eaa84a 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,20 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A loose ref that does not end with newline(LF). As
+	valid implementations of Git never created such a loose ref
+	file, it may become an error in the future. Report to the
+	git@vger.kernel.org mailing list if you see this error, as
+	we need to know what tools created such a file.
+
+`trailingRefContent`::
+	(INFO) A loose ref has trailing content. As valid implementations
+	of Git never created such a loose ref file, it may become an
+	error in the future. Report to the git@vger.kernel.org mailing
+	list if you see this error, as we need to know what tools
+	created such a file.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 395a17273c..f88b32a633 100644
--- a/refs.c
+++ b/refs.c
@@ -1789,7 +1789,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2861980bdd..b1fba92e5f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -569,7 +569,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -606,7 +606,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -628,6 +628,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3533,7 +3538,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		strbuf_rtrim(&ref_content);
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
@@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "misses LF at the end");
+			goto cleanup;
+		}
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "has trailing garbage: '%s'", trailing);
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 037d7991cd..125f1fe735 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -716,7 +716,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 29bdd3fc01..0418d79c4f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -201,6 +201,61 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
 	EOF
 	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-1: trailingRefContent: has trailing garbage: '\''
+
+
+	'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-1 &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-2: trailingRefContent: has trailing garbage: '\''
+
+
+	  garbage'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-2 &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-3: trailingRefContent: has trailing garbage: '\''    garbage
+	a'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-3 &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-garbage-4: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $tag_dir_prefix/tag-garbage-4 &&
 	test_cmp expect err
 '
 
@@ -219,12 +274,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
 	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
 	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
@@ -278,6 +337,14 @@ test_expect_success 'ref content checks should work with worktrees' '
 	error: worktrees/worktree-1/refs/worktree/bad-branch-3: badRefContent: $bad_content_3
 	EOF
 	rm $worktree1_refdir_prefix/bad-branch-3 &&
+	test_cmp expect err &&
+
+	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $worktree1_refdir_prefix/branch-no-newline &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 7/9] ref: add basic symref content check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (5 preceding siblings ...)
  2024-10-21 13:34             ` [PATCH v6 6/9] ref: add more strict checks for regular refs shejialuo
@ 2024-10-21 13:35             ` shejialuo
  2024-10-21 13:35             ` [PATCH v6 8/9] ref: check whether the target of the symref is a ref shejialuo
                               ` (4 subsequent siblings)
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:35 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular ref
could be accepted as a textual symref if it begins with "ref:", followed
by zero or more whitespaces, followed by the full refname, followed only
by whitespace characters. However, we always write a single SP after
"ref:" and a single LF after the refname. It may seem that we should
report a fsck error message when the "referent" does not apply above
rules and we should not be so aggressive because third-party
reimplementations of Git may have taken advantage of the looser syntax.
Put it more specific, we accept the following contents:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created two fsck
infos "refMissingNewline" and "trailingRefContent" which exactly
represents above situations. So we will reuse these two fsck messages to
write checks to info the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors by using "is_root_ref" and "check_refname_format" to check
the "referent". Since both "is_root_ref" and "check_refname_format"
don't work with whitespaces, we use the trimmed version of "referent"
with these functions.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "is_root_ref" and "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  40 ++++++++++++
 t/t0602-reffiles-fsck.sh      | 111 ++++++++++++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 6db0eaa84a..dcea05edfc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..5227dfdef2 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b1fba92e5f..1a267547f2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3509,6 +3509,43 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *target_name,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (!is_root_ref(referent->buf) &&
+	    check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *target_name,
@@ -3559,6 +3596,9 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 0418d79c4f..f475966d7b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -289,6 +289,109 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-head &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -345,6 +448,14 @@ test_expect_success 'ref content checks should work with worktrees' '
 	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-garbage &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 8/9] ref: check whether the target of the symref is a ref
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (6 preceding siblings ...)
  2024-10-21 13:35             ` [PATCH v6 7/9] ref: add basic symref content check for files backend shejialuo
@ 2024-10-21 13:35             ` shejialuo
  2024-10-21 13:35             ` [PATCH v6 9/9] ref: add symlink ref content check for files backend shejialuo
                               ` (3 subsequent siblings)
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:35 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpseudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the above situation, create
a new fsck message "symrefTargetIsNotARef" to notify the user that this
may become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 +++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 14 ++++++++++++--
 t/t0602-reffiles-fsck.sh      | 28 ++++++++++++++++++++++++++++
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index dcea05edfc..f82ebc58e8 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,15 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symrefTargetIsNotARef`::
+	(INFO) The target of a symbolic reference points neither to
+	a root reference nor to a reference starting with "refs/".
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `trailingRefContent`::
 	(INFO) A loose ref has trailing content. As valid implementations
 	of Git never created such a loose ref file, it may become an
diff --git a/fsck.h b/fsck.h
index 5227dfdef2..53a47612e6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1a267547f2..b4912af3b5 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3513,6 +3513,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent)
 {
+	int is_referent_root;
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
@@ -3521,8 +3522,17 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
-	if (!is_root_ref(referent->buf) &&
-	    check_refname_format(referent->buf, 0)) {
+	is_referent_root = is_root_ref(referent->buf);
+	if (!is_referent_root &&
+	    !starts_with(referent->buf, "refs/") &&
+	    !starts_with(referent->buf, "worktrees/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF,
+				      "points to non-ref target '%s'", referent->buf);
+
+	}
+
+	if (!is_referent_root && check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to invalid refname '%s'", referent->buf);
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index f475966d7b..c6d40ce9a1 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -392,6 +392,34 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'the target of the textual symref should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/foo\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''refs-back/heads/main'\''
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v6 9/9] ref: add symlink ref content check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (7 preceding siblings ...)
  2024-10-21 13:35             ` [PATCH v6 8/9] ref: check whether the target of the symref is a ref shejialuo
@ 2024-10-21 13:35             ` shejialuo
  2024-10-21 16:09             ` [PATCH v6 0/9] add " Taylor Blau
                               ` (2 subsequent siblings)
  11 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-21 13:35 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Besides the textual symref, we also allow symbolic links as the symref.
So, we should also provide the consistency check as what we have done
for textual symref. And also we consider deprecating writing the
symbolic links. We first need to access whether symbolic links still
be used. So, add a new fsck message "symlinkRef(INFO)" to tell the
user be aware of this information.

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

And we need to also generate the "referent" parameter for reusing
"files_fsck_symref_target" by the following steps:

1. Use "strbuf_add_real_path" to resolve the symlink and get the
   absolute path "ref_content" which the symlink ref points to.
2. Generate the absolute path "abs_gitdir" of "gitdir" and combine
   "ref_content" and "abs_gitdir" to extract the relative path
   "relative_referent_path".
3. If "ref_content" is outside of "gitdir", we just set "referent" with
   "ref_content". Instead, we set "referent" with
   "relative_referent_path".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  6 +++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 38 +++++++++++++++++++++++++----
 t/t0602-reffiles-fsck.sh      | 45 +++++++++++++++++++++++++++++++++++
 4 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index f82ebc58e8..b14bc44ca4 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,12 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symlinkRef`::
+	(INFO) A symbolic link is used as a symref. Report to the
+	git@vger.kernel.org mailing list if you see this error, as we
+	are assessing the feasibility of dropping the support to drop
+	creating symbolic links as symrefs.
+
 `symrefTargetIsNotARef`::
 	(INFO) The target of a symbolic reference points neither to
 	a root reference nor to a reference starting with "refs/".
diff --git a/fsck.h b/fsck.h
index 53a47612e6..a44c231a5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -86,6 +86,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
 	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b4912af3b5..180f8e28b7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../config.h"
 #include "../copy.h"
 #include "../environment.h"
@@ -3511,7 +3512,8 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    struct strbuf *referent)
+				    struct strbuf *referent,
+				    unsigned int symbolic_link)
 {
 	int is_referent_root;
 	char orig_last_byte;
@@ -3520,7 +3522,8 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	orig_len = referent->len;
 	orig_last_byte = referent->buf[orig_len - 1];
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
 
 	is_referent_root = is_root_ref(referent->buf);
 	if (!is_referent_root &&
@@ -3539,6 +3542,9 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
+	if (symbolic_link)
+		goto out;
+
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
 		ret = fsck_report_ref(o, report,
@@ -3562,6 +3568,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
 	const char *trailing = NULL;
@@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	report.path = target_name;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path = NULL;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&ref_content, iter->path.buf);
+		skip_prefix(ref_content.buf, abs_gitdir.buf,
+			    &relative_referent_path);
+
+		if (relative_referent_path)
+			strbuf_addstr(&referent, relative_referent_path);
+		else
+			strbuf_addbuf(&referent, &ref_content);
+
+		ret |= files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = fsck_report_ref(o, &report,
@@ -3607,13 +3636,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 			goto cleanup;
 		}
 	} else {
-		ret = files_fsck_symref_target(o, &report, &referent);
+		ret = files_fsck_symref_target(o, &report, &referent, 0);
 		goto cleanup;
 	}
 
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index c6d40ce9a1..aee7e04b82 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -420,6 +420,51 @@ test_expect_success 'the target of the textual symref should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   " $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferentName: points to invalid refname '\''refs/heads/branch   '\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 2/9] ref: check the full refname instead of basename
  2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
@ 2024-10-21 15:38               ` karthik nayak
  2024-10-22 11:42                 ` shejialuo
  2024-11-05  7:11               ` Patrick Steinhardt
  1 sibling, 1 reply; 209+ messages in thread
From: karthik nayak @ 2024-10-21 15:38 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

shejialuo <shejialuo@gmail.com> writes:

[snip]

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 71a4d1a5ae..0aee377439 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
>  	git tag tag-2 &&
>  	git tag multi_hierarchy/tag-2 &&
>
> +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> +	git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	EOF
> +	test_must_be_empty err &&
> +	rm $branch_dir_prefix/@ &&
> +
>  	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
>  	test_must_fail git refs verify 2>err &&
>  	cat >expect <<-EOF &&
> @@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
>  	rm $branch_dir_prefix/.branch-1 &&
>  	test_cmp expect err &&
>
> -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&

Nit: Here and below we could use ${SQ} instead.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 3/9] ref: initialize target name outside of check functions
  2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
@ 2024-10-21 15:49               ` karthik nayak
  2024-11-05  7:11               ` Patrick Steinhardt
  1 sibling, 0 replies; 209+ messages in thread
From: karthik nayak @ 2024-10-21 15:49 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> We passes "refs_check_dir" to the "files_fsck_refs_name" function which
> allows it to create the checked ref name later. However, when we
> introduce a new check function, we have to re-calculate the target name.
> It's bad for us to do repeat calculation. Instead, we should calculate
> it only once and pass the target name to the check functions.
>
> In order not to do repeat calculation, rename "refs_check_dir" to
> "target_name". And in "files_fsck_refs_dir", create a new strbuf

Nit: Why `target_name` and not simply `target`?

> "target_name", thus whenever we handle a new target, calculate the
> name and call the check functions one by one.
>
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  refs/files-backend.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
@ 2024-10-21 15:56               ` karthik nayak
  2024-10-22 11:44                 ` shejialuo
  2024-11-05  7:11               ` Patrick Steinhardt
  1 sibling, 1 reply; 209+ messages in thread
From: karthik nayak @ 2024-10-21 15:56 UTC (permalink / raw)
  To: shejialuo, git; +Cc: Patrick Steinhardt, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

shejialuo <shejialuo@gmail.com> writes:

[snip]

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 0aee377439..6eb1385c50 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -105,4 +105,63 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
>  	test_must_be_empty err
>  '
>
> +test_expect_success 'ref name check should work for multiple worktrees' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +
> +	cd repo &&
> +	test_commit initial &&
> +	git checkout -b branch-1 &&
> +	test_commit second &&
> +	git checkout -b branch-2 &&
> +	test_commit third &&
> +	git checkout -b branch-3 &&
> +	git worktree add ./worktree-1 branch-1 &&
> +	git worktree add ./worktree-2 branch-2 &&
> +	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
> +	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
> +
> +	(
> +		cd worktree-1 &&
> +		git update-ref refs/worktree/branch-4 refs/heads/branch-3
> +	) &&
> +	(
> +		cd worktree-2 &&
> +		git update-ref refs/worktree/branch-4 refs/heads/branch-3
> +	) &&
> +
> +	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
> +	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
> +
> +	test_must_fail git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> +	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> +	EOF
> +	sort err >sorted_err &&
> +	test_cmp expect sorted_err &&
> +
> +	(
> +		cd worktree-1 &&
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> +		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> +		EOF
> +		sort err >sorted_err &&
> +		test_cmp expect sorted_err
> +	) &&
> +
> +	(
> +		cd worktree-2 &&
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> +		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> +		EOF
> +		sort err >sorted_err &&
> +		test_cmp expect sorted_err
> +	)

These last three loops are the same, couldn't we loop?

for dir in "." "worktree-1" "worktree-2"
do
    ...
done

> +'
> +
>  test_done
> --
> 2.47.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 0/9] add ref content check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (8 preceding siblings ...)
  2024-10-21 13:35             ` [PATCH v6 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-10-21 16:09             ` Taylor Blau
  2024-10-22 11:41               ` shejialuo
  2024-10-21 16:18             ` Taylor Blau
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
  11 siblings, 1 reply; 209+ messages in thread
From: Taylor Blau @ 2024-10-21 16:09 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:32:20PM +0800, shejialuo wrote:
> Hi All:
>
> This new version updates the following things.

I am assuming that this new round was rebased onto the tip of 'master',
since I could not apply it on top of its original base

  b3d175409d9 (Merge branch 'sj/ref-fsck', 2024-08-16)

In the future, please indicate when you rebase your series so that I
know what the correct base is for that round.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 0/9] add ref content check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (9 preceding siblings ...)
  2024-10-21 16:09             ` [PATCH v6 0/9] add " Taylor Blau
@ 2024-10-21 16:18             ` Taylor Blau
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
  11 siblings, 0 replies; 209+ messages in thread
From: Taylor Blau @ 2024-10-21 16:18 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:32:20PM +0800, shejialuo wrote:
> shejialuo (9):
>   ref: initialize "fsck_ref_report" with zero
>   ref: check the full refname instead of basename
>   ref: initialize target name outside of check functions
>   ref: support multiple worktrees check for refs
>   ref: port git-fsck(1) regular refs check for files backend
>   ref: add more strict checks for regular refs
>   ref: add basic symref content check for files backend
>   ref: check whether the target of the symref is a ref
>   ref: add symlink ref content check for files backend
>
>  Documentation/fsck-msgids.txt |  35 +++
>  builtin/refs.c                |  12 +-
>  fsck.h                        |   6 +
>  refs.c                        |   7 +-
>  refs.h                        |   3 +-
>  refs/debug.c                  |   5 +-
>  refs/files-backend.c          | 187 ++++++++++++--
>  refs/packed-backend.c         |   8 +-
>  refs/refs-internal.h          |   5 +-
>  refs/reftable-backend.c       |   3 +-
>  t/t0602-reffiles-fsck.sh      | 457 +++++++++++++++++++++++++++++++++-
>  11 files changed, 693 insertions(+), 35 deletions(-)

Great, thanks for the new round. Looking at the inter-diff, it looks
like this round also needs a fresh review. I'm catching up on new
threads from the weekend, so I'll put this on my review queue. But in
the meantime, if your mentors can look at it, that would be much
appreciated.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 0/9] add ref content check for files backend
  2024-10-21 16:09             ` [PATCH v6 0/9] add " Taylor Blau
@ 2024-10-22 11:41               ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-22 11:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Patrick Steinhardt, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 12:09:44PM -0400, Taylor Blau wrote:
> On Mon, Oct 21, 2024 at 09:32:20PM +0800, shejialuo wrote:
> > Hi All:
> >
> > This new version updates the following things.
> 
> I am assuming that this new round was rebased onto the tip of 'master',
> since I could not apply it on top of its original base
> 
>   b3d175409d9 (Merge branch 'sj/ref-fsck', 2024-08-16)
> 
> In the future, please indicate when you rebase your series so that I
> know what the correct base is for that round.
> 

Sorry for that Taylor. I have told Junio that I rebased the series in
the previous version. And I forgot you have become the intermediate
maintainer and didn't provide this information for you.

Thanks,
Jiauo

> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 2/9] ref: check the full refname instead of basename
  2024-10-21 15:38               ` karthik nayak
@ 2024-10-22 11:42                 ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-22 11:42 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Mon, Oct 21, 2024 at 10:38:02AM -0500, karthik nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> [snip]
> 
> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 71a4d1a5ae..0aee377439 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
> >  	git tag tag-2 &&
> >  	git tag multi_hierarchy/tag-2 &&
> >
> > +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> > +	git refs verify 2>err &&
> > +	cat >expect <<-EOF &&
> > +	EOF
> > +	test_must_be_empty err &&
> > +	rm $branch_dir_prefix/@ &&
> > +
> >  	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
> >  	test_must_fail git refs verify 2>err &&
> >  	cat >expect <<-EOF &&
> > @@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
> >  	rm $branch_dir_prefix/.branch-1 &&
> >  	test_cmp expect err &&
> >
> > -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> > +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
> 
> Nit: Here and below we could use ${SQ} instead.
> 

I agree.

> [snip]



^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-10-21 15:56               ` karthik nayak
@ 2024-10-22 11:44                 ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-10-22 11:44 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Mon, Oct 21, 2024 at 10:56:30AM -0500, karthik nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> [snip]
> 
> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 0aee377439..6eb1385c50 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -105,4 +105,63 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
> >  	test_must_be_empty err
> >  '
> >
> > +test_expect_success 'ref name check should work for multiple worktrees' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +
> > +	cd repo &&
> > +	test_commit initial &&
> > +	git checkout -b branch-1 &&
> > +	test_commit second &&
> > +	git checkout -b branch-2 &&
> > +	test_commit third &&
> > +	git checkout -b branch-3 &&
> > +	git worktree add ./worktree-1 branch-1 &&
> > +	git worktree add ./worktree-2 branch-2 &&
> > +	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
> > +	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
> > +
> > +	(
> > +		cd worktree-1 &&
> > +		git update-ref refs/worktree/branch-4 refs/heads/branch-3
> > +	) &&
> > +	(
> > +		cd worktree-2 &&
> > +		git update-ref refs/worktree/branch-4 refs/heads/branch-3
> > +	) &&
> > +
> > +	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
> > +	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
> > +
> > +	test_must_fail git refs verify 2>err &&
> > +	cat >expect <<-EOF &&
> > +	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> > +	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> > +	EOF
> > +	sort err >sorted_err &&
> > +	test_cmp expect sorted_err &&
> > +
> > +	(
> > +		cd worktree-1 &&
> > +		test_must_fail git refs verify 2>err &&
> > +		cat >expect <<-EOF &&
> > +		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> > +		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> > +		EOF
> > +		sort err >sorted_err &&
> > +		test_cmp expect sorted_err
> > +	) &&
> > +
> > +	(
> > +		cd worktree-2 &&
> > +		test_must_fail git refs verify 2>err &&
> > +		cat >expect <<-EOF &&
> > +		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
> > +		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
> > +		EOF
> > +		sort err >sorted_err &&
> > +		test_cmp expect sorted_err
> > +	)
> 
> These last three loops are the same, couldn't we loop?
> 
> for dir in "." "worktree-1" "worktree-2"
> do
>     ...
> done
> 

Actually, I guess all the tests could be written with that way. I need
to refactor in the next version to make the tests cleaner.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 2/9] ref: check the full refname instead of basename
  2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
  2024-10-21 15:38               ` karthik nayak
@ 2024-11-05  7:11               ` Patrick Steinhardt
  2024-11-06 12:37                 ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-05  7:11 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:34:22PM +0800, shejialuo wrote:
> In "files-backend.c::files_fsck_refs_name", we validate the refname
> format by using "check_refname_format" to check the basename of the
> iterator with "REFNAME_ALLOW_ONELEVEL" flag.
> 
> However, this is a bad implementation. Although we doesn't allow a
> single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
> report an error wrongly when there is a "refs/heads/@" ref by using one
> level refname "@".
> 
> Because we just check one level refname, we either cannot check the
> other parts of the full refname. And we will ignore the following
> errors:
> 
>   "refs/heads/ new-feature/test"
>   "refs/heads/~new-feature/test"
> 
> In order to fix the above problem, enhance "files_fsck_refs_name" to use
> the full name for "check_refname_format". Then, replace the tests which
> are related to "@" and add tests to exercise the above situations.

Okay, makes sense.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 03d2503276..f246c92684 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3519,10 +3519,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
>  	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
>  		goto cleanup;
>  
> -	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> +	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
> +	if (check_refname_format(sb.buf, 0)) {
>  		struct fsck_ref_report report = { 0 };
>  
> -		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
>  		report.path = sb.buf;
>  		ret = fsck_report_ref(o, &report,
>  				      FSCK_MSG_BAD_REF_NAME,

So this only works right now because we never check root refs in the
first place? Maybe that is worth a comment.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 71a4d1a5ae..0aee377439 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
>  	git tag tag-2 &&
>  	git tag multi_hierarchy/tag-2 &&
>  
> +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> +	git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	EOF
> +	test_must_be_empty err &&
> +	rm $branch_dir_prefix/@ &&

`expect` isn't used here as you use `test_must_be_empty`.

>  	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
>  	test_must_fail git refs verify 2>err &&
>  	cat >expect <<-EOF &&
> @@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
>  	rm $branch_dir_prefix/.branch-1 &&
>  	test_cmp expect err &&
>  
> -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
>  	test_must_fail git refs verify 2>err &&
>  	cat >expect <<-EOF &&
> -	error: refs/heads/@: badRefName: invalid refname format
> +	error: refs/heads/ branch-1: badRefName: invalid refname format
>  	EOF
> -	rm $branch_dir_prefix/@ &&
> +	rm $branch_dir_prefix/'\'' branch-1'\'' &&
>  	test_cmp expect err &&

Okay, we now allow `refs/heads/@`, but still don't allow other bad
formatting like spaces in the refname.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 3/9] ref: initialize target name outside of check functions
  2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
  2024-10-21 15:49               ` karthik nayak
@ 2024-11-05  7:11               ` Patrick Steinhardt
  2024-11-06 12:32                 ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-05  7:11 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:34:31PM +0800, shejialuo wrote:
> We passes "refs_check_dir" to the "files_fsck_refs_name" function which
> allows it to create the checked ref name later. However, when we
> introduce a new check function, we have to re-calculate the target name.
> It's bad for us to do repeat calculation. Instead, we should calculate
> it only once and pass the target name to the check functions.

It would be nice to clarify what exactly is bad about it. Does it create
extra memory churn? Or is this about not duplicating logic?

> In order not to do repeat calculation, rename "refs_check_dir" to
> "target_name". And in "files_fsck_refs_dir", create a new strbuf
> "target_name", thus whenever we handle a new target, calculate the
> name and call the check functions one by one.

"target_name" is somewhat of a weird name. I'd expect that this is
either the path to the reference, in which case I'd call this "path", or
the name of the reference that is to be checked, in which case I'd call
this "refname".

> @@ -3539,6 +3538,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
>  			       const char *refs_check_dir,
>  			       files_fsck_refs_fn *fsck_refs_fn)
>  {
> +	struct strbuf target_name = STRBUF_INIT;
>  	struct strbuf sb = STRBUF_INIT;
>  	struct dir_iterator *iter;
>  	int iter_status;
> @@ -3557,11 +3557,15 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
>  			continue;
>  		} else if (S_ISREG(iter->st.st_mode) ||
>  			   S_ISLNK(iter->st.st_mode)) {
> +			strbuf_reset(&target_name);
> +			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
> +				    iter->relative_path);
> +
>  			if (o->verbose)
> -				fprintf_ln(stderr, "Checking %s/%s",
> -					   refs_check_dir, iter->relative_path);
> +				fprintf_ln(stderr, "Checking %s", target_name.buf);
> +
>  			for (size_t i = 0; fsck_refs_fn[i]; i++) {
> -				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
> +				if (fsck_refs_fn[i](ref_store, o, target_name.buf, iter))
>  					ret = -1;
>  			}
>  		} else {

The change itself does make sense though. We indeed avoid reallocating
the array for every single ref, which is a worthwhile change.

I was wondering whether we could reuse `sb` here, but we do use it at
the end of the function to potentially print an error message.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
  2024-10-21 15:56               ` karthik nayak
@ 2024-11-05  7:11               ` Patrick Steinhardt
  2024-11-05 12:52                 ` shejialuo
  1 sibling, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-05  7:11 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:34:40PM +0800, shejialuo wrote:
> We have already set up the infrastructure to check the consistency for
> refs, but we do not support multiple worktrees. As we decide to add more
> checks for ref content, we need to set up support for multiple
> worktrees.

I don't quite follow that logic: the fact that we perform more checks
for the ref content doesn't necessarily mean that we also have to check
worktree refs. We rather want to do that so that we get feature parity
with git-fsck(1) eventually, don't we?

> @@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
>  static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
>  {
>  	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
> +	struct worktree **worktrees, **p;
>  	const char * const verify_usage[] = {
>  		REFS_VERIFY_USAGE,
>  		NULL,

Instead of declaring the `**p` variable we can instead...

> @@ -84,9 +86,15 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
>  	git_config(git_fsck_config, &fsck_refs_options);
>  	prepare_repo_settings(the_repository);
>  
> -	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
> +	worktrees = get_worktrees();
> +	for (p = worktrees; *p; p++) {
> +		struct worktree *wt = *p;
> +		ret |= refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options, wt);
> +	}
> +

... refactor this loop like this:

    for (size_t i = 0; worktrees[i]; i++)
        ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
                         &fsck_refs_options, worktrees[i]);

I was briefly wondering whether we also get worktrees in case the repo
is bare, as we don't actually have a proper worktree there. But the
answer seems to be "yes".

> @@ -3558,6 +3560,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
>  		} else if (S_ISREG(iter->st.st_mode) ||
>  			   S_ISLNK(iter->st.st_mode)) {
>  			strbuf_reset(&target_name);
> +
> +			if (!is_main_worktree(wt))
> +				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
>  			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
>  				    iter->relative_path);
>  

Hm. Isn't it somewhat duplicate to pass both the prepended target name
_and_ the worktree to the callback? I imagine that we'd have to
eventually strip the worktree prefix to find the correct ref, unless we
end up using the main ref store to look up the ref.

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 07c57fd541..46dcaec654 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -13,6 +13,7 @@
>  #include "../lockfile.h"
>  #include "../chdir-notify.h"
>  #include "../statinfo.h"
> +#include "../worktree.h"
>  #include "../wrapper.h"
>  #include "../write-or-die.h"
>  #include "../trace2.h"
> @@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
>  }
>  
>  static int packed_fsck(struct ref_store *ref_store UNUSED,
> -		       struct fsck_options *o UNUSED)
> +		       struct fsck_options *o UNUSED,
> +		       struct worktree *wt)
>  {
> +
> +	if (!is_main_worktree(wt))
> +		return 0;
> +
>  	return 0;
>  }

It's somewhat funny to have this condition here, but it does make sense
overall as worktrees never have packed refs in the first place.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-10-21 13:34             ` [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-05  7:11               ` Patrick Steinhardt
  0 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-05  7:11 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Mon, Oct 21, 2024 at 09:34:47PM +0800, shejialuo wrote:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 24ad73faba..2861980bdd 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *target_name,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *target_name,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct fsck_ref_report report = { 0 };
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	report.path = target_name;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "cannot read ref file '%s': (%s)",
> +				      iter->path.buf, strerror(errno));
> +		goto cleanup;
> +	}

Let's drop the braces around `(%s)`, we don't print such braces in
`warning_errno()` or `die_errno()`, either.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-11-05  7:11               ` Patrick Steinhardt
@ 2024-11-05 12:52                 ` shejialuo
  2024-11-06  6:34                   ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-11-05 12:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Nov 05, 2024 at 08:11:49AM +0100, Patrick Steinhardt wrote:
> On Mon, Oct 21, 2024 at 09:34:40PM +0800, shejialuo wrote:
> > We have already set up the infrastructure to check the consistency for
> > refs, but we do not support multiple worktrees. As we decide to add more
> > checks for ref content, we need to set up support for multiple
> > worktrees.
> 
> I don't quite follow that logic: the fact that we perform more checks
> for the ref content doesn't necessarily mean that we also have to check
> worktree refs. We rather want to do that so that we get feature parity
> with git-fsck(1) eventually, don't we?
> 

Yes, I agree. I come across why I wrote such message. Actually, in the
very early implementation, I didn't consider about worktree situation
for the "escape". And I thought I should add support for worktree. So, I
made a mistake.

[snip]

> > @@ -3558,6 +3560,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
> >  		} else if (S_ISREG(iter->st.st_mode) ||
> >  			   S_ISLNK(iter->st.st_mode)) {
> >  			strbuf_reset(&target_name);
> > +
> > +			if (!is_main_worktree(wt))
> > +				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
> >  			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
> >  				    iter->relative_path);
> >  
> 
> Hm. Isn't it somewhat duplicate to pass both the prepended target name
> _and_ the worktree to the callback? I imagine that we'd have to
> eventually strip the worktree prefix to find the correct ref, unless we
> end up using the main ref store to look up the ref.
> 

Actually, the worktree won't be passed to the callback. The
`fsck_refs_fn` function will never use worktree `wt`. The reason why I
use `wt` is that we need to print _full_ path information to the user
when error happens for the situation where worktree A and worktree B has
the same ref name "refs/worktree/foo".

I agree that we will strip the worktree prefix to find the correct ref
in the file system. This is done by the following statement:

	strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir);

For worktree, `ref_store->gitdir` will automatically be
`.git/worktrees/<id>`.

In the v5, I didn't print the full path and we even didn't need the
parameter `wt`. However, if we want to print the following info:

	worktrees/<id>/refs/worktree/a

So, just because we need the `worktrees/<id>` information. Actually, we
could also get the information by using "ref_store->gitdir" and
"ref_store->repo->gitdir". However, this is cumbersome and it's a bad
idea. So I change the prototype of "fsck_fn" to add a new parameter
"struct worktree *".

> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index 07c57fd541..46dcaec654 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -13,6 +13,7 @@
> >  #include "../lockfile.h"
> >  #include "../chdir-notify.h"
> >  #include "../statinfo.h"
> > +#include "../worktree.h"
> >  #include "../wrapper.h"
> >  #include "../write-or-die.h"
> >  #include "../trace2.h"
> > @@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
> >  }
> >  
> >  static int packed_fsck(struct ref_store *ref_store UNUSED,
> > -		       struct fsck_options *o UNUSED)
> > +		       struct fsck_options *o UNUSED,
> > +		       struct worktree *wt)
> >  {
> > +
> > +	if (!is_main_worktree(wt))
> > +		return 0;
> > +
> >  	return 0;
> >  }
> 
> It's somewhat funny to have this condition here, but it does make sense
> overall as worktrees never have packed refs in the first place.
> 

Yes, there is no packed-refs in the worktree. And we need to prevent
calling multiple times.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-11-05 12:52                 ` shejialuo
@ 2024-11-06  6:34                   ` Patrick Steinhardt
  2024-11-06 12:20                     ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-06  6:34 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Nov 05, 2024 at 08:52:19PM +0800, shejialuo wrote:
> On Tue, Nov 05, 2024 at 08:11:49AM +0100, Patrick Steinhardt wrote:
> > On Mon, Oct 21, 2024 at 09:34:40PM +0800, shejialuo wrote:
> > > @@ -3558,6 +3560,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
> > >  		} else if (S_ISREG(iter->st.st_mode) ||
> > >  			   S_ISLNK(iter->st.st_mode)) {
> > >  			strbuf_reset(&target_name);
> > > +
> > > +			if (!is_main_worktree(wt))
> > > +				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
> > >  			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
> > >  				    iter->relative_path);
> > >  
> > 
> > Hm. Isn't it somewhat duplicate to pass both the prepended target name
> > _and_ the worktree to the callback? I imagine that we'd have to
> > eventually strip the worktree prefix to find the correct ref, unless we
> > end up using the main ref store to look up the ref.
> > 
> 
> Actually, the worktree won't be passed to the callback. The
> `fsck_refs_fn` function will never use worktree `wt`. The reason why I
> use `wt` is that we need to print _full_ path information to the user
> when error happens for the situation where worktree A and worktree B has
> the same ref name "refs/worktree/foo".
> 
> I agree that we will strip the worktree prefix to find the correct ref
> in the file system. This is done by the following statement:
> 
> 	strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir);
> 
> For worktree, `ref_store->gitdir` will automatically be
> `.git/worktrees/<id>`.
> 
> In the v5, I didn't print the full path and we even didn't need the
> parameter `wt`. However, if we want to print the following info:
> 
> 	worktrees/<id>/refs/worktree/a
> 
> So, just because we need the `worktrees/<id>` information. Actually, we
> could also get the information by using "ref_store->gitdir" and
> "ref_store->repo->gitdir". However, this is cumbersome and it's a bad
> idea. So I change the prototype of "fsck_fn" to add a new parameter
> "struct worktree *".

In practice you can also derive that full refname from the worktree
itself, as the ID is stored in `struct worktree::id`. Would that maybe
be a better solution?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 4/9] ref: support multiple worktrees check for refs
  2024-11-06  6:34                   ` Patrick Steinhardt
@ 2024-11-06 12:20                     ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-06 12:20 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Nov 06, 2024 at 07:34:08AM +0100, Patrick Steinhardt wrote:

[snip]

> 
> In practice you can also derive that full refname from the worktree
> itself, as the ID is stored in `struct worktree::id`. Would that maybe
> be a better solution?
> 

I think we are on the same boat. This is exactly what I have done in
this patch.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 3/9] ref: initialize target name outside of check functions
  2024-11-05  7:11               ` Patrick Steinhardt
@ 2024-11-06 12:32                 ` shejialuo
  2024-11-06 13:14                   ` Patrick Steinhardt
  0 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-11-06 12:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Nov 05, 2024 at 08:11:46AM +0100, Patrick Steinhardt wrote:
> On Mon, Oct 21, 2024 at 09:34:31PM +0800, shejialuo wrote:
> > We passes "refs_check_dir" to the "files_fsck_refs_name" function which
> > allows it to create the checked ref name later. However, when we
> > introduce a new check function, we have to re-calculate the target name.
> > It's bad for us to do repeat calculation. Instead, we should calculate
> > it only once and pass the target name to the check functions.
> 
> It would be nice to clarify what exactly is bad about it. Does it create
> extra memory churn? Or is this about not duplicating logic?
> 

Thanks, I will improve this in the next version.

> > In order not to do repeat calculation, rename "refs_check_dir" to
> > "target_name". And in "files_fsck_refs_dir", create a new strbuf
> > "target_name", thus whenever we handle a new target, calculate the
> > name and call the check functions one by one.
> 
> "target_name" is somewhat of a weird name. I'd expect that this is
> either the path to the reference, in which case I'd call this "path", or
> the name of the reference that is to be checked, in which case I'd call
> this "refname".
> 

I felt quite hard to name this variable when I wrote the code. "refname"
is not suitable due to we may check the reflog later by calling
"files_fsck_refs_dir" function.

So, we should use "path" here.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 2/9] ref: check the full refname instead of basename
  2024-11-05  7:11               ` Patrick Steinhardt
@ 2024-11-06 12:37                 ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-06 12:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Tue, Nov 05, 2024 at 08:11:42AM +0100, Patrick Steinhardt wrote:

[snip]

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 03d2503276..f246c92684 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3519,10 +3519,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
> >  	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
> >  		goto cleanup;
> >  
> > -	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> > +	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
> > +	if (check_refname_format(sb.buf, 0)) {
> >  		struct fsck_ref_report report = { 0 };
> >  
> > -		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
> >  		report.path = sb.buf;
> >  		ret = fsck_report_ref(o, &report,
> >  				      FSCK_MSG_BAD_REF_NAME,
> 
> So this only works right now because we never check root refs in the
> first place? Maybe that is worth a comment.
> 

Yes, I agree. I will improve this in the next version.

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 71a4d1a5ae..0aee377439 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -25,6 +25,13 @@ test_expect_success 'ref name should be checked' '
> >  	git tag tag-2 &&
> >  	git tag multi_hierarchy/tag-2 &&
> >  
> > +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> > +	git refs verify 2>err &&
> > +	cat >expect <<-EOF &&
> > +	EOF
> > +	test_must_be_empty err &&
> > +	rm $branch_dir_prefix/@ &&
> 
> `expect` isn't used here as you use `test_must_be_empty`.
> 

Thanks, I will improve this in the next version.

> >  	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
> >  	test_must_fail git refs verify 2>err &&
> >  	cat >expect <<-EOF &&
> > @@ -33,20 +40,20 @@ test_expect_success 'ref name should be checked' '
> >  	rm $branch_dir_prefix/.branch-1 &&
> >  	test_cmp expect err &&
> >  
> > -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
> > +	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
> >  	test_must_fail git refs verify 2>err &&
> >  	cat >expect <<-EOF &&
> > -	error: refs/heads/@: badRefName: invalid refname format
> > +	error: refs/heads/ branch-1: badRefName: invalid refname format
> >  	EOF
> > -	rm $branch_dir_prefix/@ &&
> > +	rm $branch_dir_prefix/'\'' branch-1'\'' &&
> >  	test_cmp expect err &&
> 
> Okay, we now allow `refs/heads/@`, but still don't allow other bad
> formatting like spaces in the refname.
> 

Yes, this is a mistake. Junio have told me in this patch and I have
realized this.

  https://lore.kernel.org/git/xmqqjzei1mtb.fsf@gitster.g/

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v6 3/9] ref: initialize target name outside of check functions
  2024-11-06 12:32                 ` shejialuo
@ 2024-11-06 13:14                   ` Patrick Steinhardt
  0 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-06 13:14 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Nov 06, 2024 at 08:32:19PM +0800, shejialuo wrote:
> On Tue, Nov 05, 2024 at 08:11:46AM +0100, Patrick Steinhardt wrote:
> > On Mon, Oct 21, 2024 at 09:34:31PM +0800, shejialuo wrote:
> > > In order not to do repeat calculation, rename "refs_check_dir" to
> > > "target_name". And in "files_fsck_refs_dir", create a new strbuf
> > > "target_name", thus whenever we handle a new target, calculate the
> > > name and call the check functions one by one.
> > 
> > "target_name" is somewhat of a weird name. I'd expect that this is
> > either the path to the reference, in which case I'd call this "path", or
> > the name of the reference that is to be checked, in which case I'd call
> > this "refname".
> > 
> 
> I felt quite hard to name this variable when I wrote the code. "refname"
> is not suitable due to we may check the reflog later by calling
> "files_fsck_refs_dir" function.

I anticipate that we'll likely have separate infra for checking reflogs
as they are both stored in a different directory and because their
format is completely different compared to normal refs. So there isn't
really too much of a point to plan ahead for sharing logic here, I'd
think, and thus "refname" might be a better fit. If that changes in the
future we can still refactor the code.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v7 0/9] add ref content check for files backend
  2024-10-21 13:32           ` [PATCH v6 " shejialuo
                               ` (10 preceding siblings ...)
  2024-10-21 16:18             ` Taylor Blau
@ 2024-11-10 12:07             ` shejialuo
  2024-11-10 12:09               ` [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
                                 ` (10 more replies)
  11 siblings, 11 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This new version solves the follow problems:

1. Enhance the commit message suggested by Patrick.
2. Rename "target_name" to "refname".
3. Enhance the shell scripts to use `for in` to avoid repetition. And
this is the main change of this new version.

Thanks,
Jialuo

shejialuo (9):
  ref: initialize "fsck_ref_report" with zero
  ref: check the full refname instead of basename
  ref: initialize ref name outside of check functions
  ref: support multiple worktrees check for refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add basic symref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  35 +++
 builtin/refs.c                |  10 +-
 fsck.h                        |   6 +
 refs.c                        |   7 +-
 refs.h                        |   3 +-
 refs/debug.c                  |   5 +-
 refs/files-backend.c          | 190 ++++++++++++--
 refs/packed-backend.c         |   8 +-
 refs/refs-internal.h          |   5 +-
 refs/reftable-backend.c       |   3 +-
 t/t0602-reffiles-fsck.sh      | 480 +++++++++++++++++++++++++++++++---
 11 files changed, 690 insertions(+), 62 deletions(-)

Range-diff against v6:
 1:  319f384f1c =  1:  bfb2a21af4 ref: initialize "fsck_ref_report" with zero
 2:  8662fc9679 !  2:  9efc83f7ea ref: check the full refname instead of basename
    @@ Commit message
     
         In order to fix the above problem, enhance "files_fsck_refs_name" to use
         the full name for "check_refname_format". Then, replace the tests which
    -    are related to "@" and add tests to exercise the above situations.
    +    are related to "@" and add tests to exercise the above situations using
    +    for loop to avoid repetition.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_stor
      		goto cleanup;
      
     -	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
    ++	/*
    ++	 * This works right now because we never check the root refs.
    ++	 */
     +	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
     +	if (check_refname_format(sb.buf, 0)) {
      		struct fsck_ref_report report = { 0 };
    @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_stor
     
      ## t/t0602-reffiles-fsck.sh ##
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name should be checked' '
    - 	git tag tag-2 &&
    - 	git tag multi_hierarchy/tag-2 &&
    + 	cd repo &&
      
    -+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
    -+	git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	EOF
    -+	test_must_be_empty err &&
    -+	rm $branch_dir_prefix/@ &&
    -+
    - 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
    - 	test_must_fail git refs verify 2>err &&
    - 	cat >expect <<-EOF &&
    -@@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name should be checked' '
    - 	rm $branch_dir_prefix/.branch-1 &&
    - 	test_cmp expect err &&
    + 	git commit --allow-empty -m initial &&
    +-	git checkout -b branch-1 &&
    +-	git tag tag-1 &&
    +-	git commit --allow-empty -m second &&
    +-	git checkout -b branch-2 &&
    +-	git tag tag-2 &&
    +-	git tag multi_hierarchy/tag-2 &&
    ++	git checkout -b default-branch &&
    ++	git tag default-tag &&
    ++	git tag multi_hierarchy/default-tag &&
      
    +-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
    +-	test_must_fail git refs verify 2>err &&
    +-	cat >expect <<-EOF &&
    +-	error: refs/heads/.branch-1: badRefName: invalid refname format
    +-	EOF
    +-	rm $branch_dir_prefix/.branch-1 &&
    +-	test_cmp expect err &&
    +-
     -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
    -+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\'' branch-1'\'' &&
    - 	test_must_fail git refs verify 2>err &&
    - 	cat >expect <<-EOF &&
    +-	test_must_fail git refs verify 2>err &&
    +-	cat >expect <<-EOF &&
     -	error: refs/heads/@: badRefName: invalid refname format
    -+	error: refs/heads/ branch-1: badRefName: invalid refname format
    - 	EOF
    --	rm $branch_dir_prefix/@ &&
    -+	rm $branch_dir_prefix/'\'' branch-1'\'' &&
    - 	test_cmp expect err &&
    +-	EOF
    ++	cp $branch_dir_prefix/default-branch $branch_dir_prefix/@ &&
    ++	git refs verify 2>err &&
    ++	test_must_be_empty err &&
    + 	rm $branch_dir_prefix/@ &&
    +-	test_cmp expect err &&
      
     -	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
    -+	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
    - 	test_must_fail git refs verify 2>err &&
    - 	cat >expect <<-EOF &&
    +-	test_must_fail git refs verify 2>err &&
    +-	cat >expect <<-EOF &&
     -	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
    -+	error: refs/tags/multi_hierarchy/~tag-2: badRefName: invalid refname format
    - 	EOF
    +-	EOF
     -	rm $tag_dir_prefix/multi_hierarchy/@ &&
    -+	rm $tag_dir_prefix/multi_hierarchy/'\''~tag-2'\'' &&
    - 	test_cmp expect err &&
    +-	test_cmp expect err &&
    +-
    +-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
    ++	cp $tag_dir_prefix/default-tag $tag_dir_prefix/tag-1.lock &&
    + 	git refs verify 2>err &&
    + 	rm $tag_dir_prefix/tag-1.lock &&
    + 	test_must_be_empty err &&
      
    - 	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
    -@@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name should be checked' '
    +-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/.lock &&
    ++	cp $tag_dir_prefix/default-tag $tag_dir_prefix/.lock &&
    + 	test_must_fail git refs verify 2>err &&
    + 	cat >expect <<-EOF &&
      	error: refs/tags/.lock: badRefName: invalid refname format
      	EOF
      	rm $tag_dir_prefix/.lock &&
    +-	test_cmp expect err
     +	test_cmp expect err &&
     +
    -+	mkdir $tag_dir_prefix/'\''~new-feature'\'' &&
    -+	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/'\''~new-feature'\''/tag-1 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/tags/~new-feature/tag-1: badRefName: invalid refname format
    -+	EOF
    -+	rm -rf $tag_dir_prefix/'\''~new-feature'\'' &&
    - 	test_cmp expect err
    ++	for refname in ".refname-starts-with-dot" "~refname-has-stride"
    ++	do
    ++		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname" &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/heads/$refname: badRefName: invalid refname format
    ++		EOF
    ++		rm "$branch_dir_prefix/$refname" &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	for refname in ".refname-starts-with-dot" "~refname-has-stride"
    ++	do
    ++		cp $tag_dir_prefix/default-tag "$tag_dir_prefix/$refname" &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/tags/$refname: badRefName: invalid refname format
    ++		EOF
    ++		rm "$tag_dir_prefix/$refname" &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	for refname in ".refname-starts-with-dot" "~refname-has-stride"
    ++	do
    ++		cp $tag_dir_prefix/multi_hierarchy/default-tag "$tag_dir_prefix/multi_hierarchy/$refname" &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/tags/multi_hierarchy/$refname: badRefName: invalid refname format
    ++		EOF
    ++		rm "$tag_dir_prefix/multi_hierarchy/$refname" &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	for refname in ".refname-starts-with-dot" "~refname-has-stride"
    ++	do
    ++		mkdir "$branch_dir_prefix/$refname" &&
    ++		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname/default-branch" &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/heads/$refname/default-branch: badRefName: invalid refname format
    ++		EOF
    ++		rm -r "$branch_dir_prefix/$refname" &&
    ++		test_cmp expect err || return 1
    ++	done
      '
      
    + test_expect_success 'ref name check should be adapted into fsck messages' '
    + 	test_when_finished "rm -rf repo" &&
    + 	git init repo &&
    + 	branch_dir_prefix=.git/refs/heads &&
    +-	tag_dir_prefix=.git/refs/tags &&
    + 	cd repo &&
    + 	git commit --allow-empty -m initial &&
    + 	git checkout -b branch-1 &&
    +-	git tag tag-1 &&
    +-	git commit --allow-empty -m second &&
    +-	git checkout -b branch-2 &&
    +-	git tag tag-2 &&
    + 
    + 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
    + 	git -c fsck.badRefName=warn refs verify 2>err &&
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should be adapted into fsck messages' '
      	rm $branch_dir_prefix/.branch-1 &&
      	test_cmp expect err &&
      
     -	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
    -+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/'\''~branch-1'\'' &&
    ++	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
      	git -c fsck.badRefName=ignore refs verify 2>err &&
      	test_must_be_empty err
      '
 3:  96144756fe !  3:  5ea7d18203 ref: initialize target name outside of check functions
    @@ Metadata
     Author: shejialuo <shejialuo@gmail.com>
     
      ## Commit message ##
    -    ref: initialize target name outside of check functions
    +    ref: initialize ref name outside of check functions
     
         We passes "refs_check_dir" to the "files_fsck_refs_name" function which
         allows it to create the checked ref name later. However, when we
    -    introduce a new check function, we have to re-calculate the target name.
    -    It's bad for us to do repeat calculation. Instead, we should calculate
    -    it only once and pass the target name to the check functions.
    +    introduce a new check function, we have to allocate redundant memory and
    +    re-calculate the ref name. It's bad for us to allocate redundant memory
    +    and duplicate logic. Instead, we should allocate and calculate it only
    +    once and pass the ref name to the check functions.
     
         In order not to do repeat calculation, rename "refs_check_dir" to
    -    "target_name". And in "files_fsck_refs_dir", create a new strbuf
    -    "target_name", thus whenever we handle a new target, calculate the
    -    name and call the check functions one by one.
    +    "refname". And in "files_fsck_refs_dir", create a new strbuf "refname",
    +    thus whenever we handle a new ref, calculate the name and call the check
    +    functions one by one.
     
         Mentored-by: Patrick Steinhardt <ps@pks.im>
         Mentored-by: Karthik Nayak <karthik.188@gmail.com>
    @@ refs/files-backend.c: static int files_ref_store_remove_on_disk(struct ref_store
      typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
      				  struct fsck_options *o,
     -				  const char *refs_check_dir,
    -+				  const char *target_name,
    ++				  const char *refname,
      				  struct dir_iterator *iter);
      
      static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
      				struct fsck_options *o,
     -				const char *refs_check_dir,
    -+				const char *target_name,
    ++				const char *refname,
      				struct dir_iterator *iter)
      {
      	struct strbuf sb = STRBUF_INIT;
     @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
    - 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
    - 		goto cleanup;
    - 
    + 	/*
    + 	 * This works right now because we never check the root refs.
    + 	 */
     -	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
     -	if (check_refname_format(sb.buf, 0)) {
    -+	if (check_refname_format(target_name, 0)) {
    ++	if (check_refname_format(refname, 0)) {
      		struct fsck_ref_report report = { 0 };
      
     -		report.path = sb.buf;
    -+		report.path = target_name;
    ++		report.path = refname;
      		ret = fsck_report_ref(o, &report,
      				      FSCK_MSG_BAD_REF_NAME,
      				      "invalid refname format");
    @@ refs/files-backend.c: static int files_fsck_refs_dir(struct ref_store *ref_store
      			       const char *refs_check_dir,
      			       files_fsck_refs_fn *fsck_refs_fn)
      {
    -+	struct strbuf target_name = STRBUF_INIT;
    ++	struct strbuf refname = STRBUF_INIT;
      	struct strbuf sb = STRBUF_INIT;
      	struct dir_iterator *iter;
      	int iter_status;
    @@ refs/files-backend.c: static int files_fsck_refs_dir(struct ref_store *ref_store
      			continue;
      		} else if (S_ISREG(iter->st.st_mode) ||
      			   S_ISLNK(iter->st.st_mode)) {
    -+			strbuf_reset(&target_name);
    -+			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
    ++			strbuf_reset(&refname);
    ++			strbuf_addf(&refname, "%s/%s", refs_check_dir,
     +				    iter->relative_path);
     +
      			if (o->verbose)
     -				fprintf_ln(stderr, "Checking %s/%s",
     -					   refs_check_dir, iter->relative_path);
    -+				fprintf_ln(stderr, "Checking %s", target_name.buf);
    ++				fprintf_ln(stderr, "Checking %s", refname.buf);
     +
      			for (size_t i = 0; fsck_refs_fn[i]; i++) {
     -				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
    -+				if (fsck_refs_fn[i](ref_store, o, target_name.buf, iter))
    ++				if (fsck_refs_fn[i](ref_store, o, refname.buf, iter))
      					ret = -1;
      			}
      		} else {
    @@ refs/files-backend.c: static int files_fsck_refs_dir(struct ref_store *ref_store
      
      out:
      	strbuf_release(&sb);
    -+	strbuf_release(&target_name);
    ++	strbuf_release(&refname);
      	return ret;
      }
      
 4:  b396bf6bc2 !  4:  cb4669b64d ref: support multiple worktrees check for refs
    @@ Commit message
         ref: support multiple worktrees check for refs
     
         We have already set up the infrastructure to check the consistency for
    -    refs, but we do not support multiple worktrees. As we decide to add more
    -    checks for ref content, we need to set up support for multiple
    -    worktrees.
    +    refs, but we do not support multiple worktrees. However, "git-fsck(1)"
    +    will check the refs of worktrees. As we decide to get feature parity
    +    with "git-fsck(1)", we need to set up support for multiple worktrees.
     
         Because each worktree has its own specific refs, instead of just showing
         the users "refs/worktree/foo", we need to display the full name such as
    @@ builtin/refs.c: static int cmd_refs_migrate(int argc, const char **argv, const c
      static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
      {
      	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
    -+	struct worktree **worktrees, **p;
    ++	struct worktree **worktrees;
      	const char * const verify_usage[] = {
      		REFS_VERIFY_USAGE,
      		NULL,
    @@ builtin/refs.c: static int cmd_refs_verify(int argc, const char **argv, const ch
      
     -	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
     +	worktrees = get_worktrees();
    -+	for (p = worktrees; *p; p++) {
    -+		struct worktree *wt = *p;
    -+		ret |= refs_fsck(get_worktree_ref_store(wt), &fsck_refs_options, wt);
    -+	}
    -+
    ++	for (size_t i = 0; worktrees[i]; i++)
    ++		ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
    ++				 &fsck_refs_options, worktrees[i]);
      
      	fsck_options_clear(&fsck_refs_options);
     +	free_worktrees(worktrees);
    @@ refs/files-backend.c: static int files_fsck_refs_name(struct ref_store *ref_stor
     +			       struct worktree *wt,
      			       files_fsck_refs_fn *fsck_refs_fn)
      {
    - 	struct strbuf target_name = STRBUF_INIT;
    + 	struct strbuf refname = STRBUF_INIT;
     @@ refs/files-backend.c: static int files_fsck_refs_dir(struct ref_store *ref_store,
      		} else if (S_ISREG(iter->st.st_mode) ||
      			   S_ISLNK(iter->st.st_mode)) {
    - 			strbuf_reset(&target_name);
    + 			strbuf_reset(&refname);
     +
     +			if (!is_main_worktree(wt))
    -+				strbuf_addf(&target_name, "worktrees/%s/", wt->id);
    - 			strbuf_addf(&target_name, "%s/%s", refs_check_dir,
    ++				strbuf_addf(&refname, "worktrees/%s/", wt->id);
    + 			strbuf_addf(&refname, "%s/%s", refs_check_dir,
      				    iter->relative_path);
      
     @@ refs/files-backend.c: static int files_fsck_refs_dir(struct ref_store *ref_store,
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should be adapted
     +	sort err >sorted_err &&
     +	test_cmp expect sorted_err &&
     +
    -+	(
    -+		cd worktree-1 &&
    -+		test_must_fail git refs verify 2>err &&
    -+		cat >expect <<-EOF &&
    -+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
    -+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
    -+		EOF
    -+		sort err >sorted_err &&
    -+		test_cmp expect sorted_err
    -+	) &&
    -+
    -+	(
    -+		cd worktree-2 &&
    -+		test_must_fail git refs verify 2>err &&
    -+		cat >expect <<-EOF &&
    -+		error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
    -+		error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
    -+		EOF
    -+		sort err >sorted_err &&
    -+		test_cmp expect sorted_err
    -+	)
    ++	for worktree in "worktree-1" "worktree-2"
    ++	do
    ++		(
    ++			cd $worktree &&
    ++			test_must_fail git refs verify 2>err &&
    ++			cat >expect <<-EOF &&
    ++			error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
    ++			error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
    ++			EOF
    ++			sort err >sorted_err &&
    ++			test_cmp expect sorted_err || return 1
    ++		)
    ++	done
     +'
     +
      test_done
 5:  6a9e297dfc !  5:  4e1add6465 ref: port git-fsck(1) regular refs check for files backend
    @@ fsck.h: enum fsck_msg_type {
     
      ## refs/files-backend.c ##
     @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
    - 				  const char *target_name,
    + 				  const char *refname,
      				  struct dir_iterator *iter);
      
     +static int files_fsck_refs_content(struct ref_store *ref_store,
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
     +		ret = fsck_report_ref(o, &report,
     +				      FSCK_MSG_BAD_REF_CONTENT,
    -+				      "cannot read ref file '%s': (%s)",
    ++				      "cannot read ref file '%s': %s",
     +				      iter->path.buf, strerror(errno));
     +		goto cleanup;
     +	}
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +
      static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
      				struct fsck_options *o,
    - 				const char *target_name,
    + 				const char *refname,
     @@ refs/files-backend.c: static int files_fsck_refs(struct ref_store *ref_store,
      {
      	files_fsck_refs_fn fsck_refs_fn[]= {
    @@ refs/files-backend.c: static int files_fsck_refs(struct ref_store *ref_store,
     
      ## t/t0602-reffiles-fsck.sh ##
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should work for multiple worktrees' '
    - 	)
    + 	done
      '
      
     +test_expect_success 'regular ref content should be checked (individual)' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	branch_dir_prefix=.git/refs/heads &&
    -+	tag_dir_prefix=.git/refs/tags &&
     +	cd repo &&
     +	test_commit default &&
     +	mkdir -p "$branch_dir_prefix/a/b" &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should work for mu
     +	git refs verify 2>err &&
     +	test_must_be_empty err &&
     +
    -+	bad_content=$(git rev-parse main)x &&
    -+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-1 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-bad-1: badRefContent: $bad_content
    -+	EOF
    -+	rm $tag_dir_prefix/tag-bad-1 &&
    -+	test_cmp expect err &&
    -+
    -+	bad_content=xfsazqfxcadas &&
    -+	printf "%s" $bad_content >$tag_dir_prefix/tag-bad-2 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-bad-2: badRefContent: $bad_content
    -+	EOF
    -+	rm $tag_dir_prefix/tag-bad-2 &&
    -+	test_cmp expect err &&
    -+
    -+	bad_content=Xfsazqfxcadas &&
    -+	printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
    -+	EOF
    -+	rm $branch_dir_prefix/a/b/branch-bad &&
    -+	test_cmp expect err
    ++	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
    ++	do
    ++		printf "%s" $bad_content >$branch_dir_prefix/branch-bad &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/heads/branch-bad: badRefContent: $bad_content
    ++		EOF
    ++		rm $branch_dir_prefix/branch-bad &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
    ++	do
    ++		printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
    ++		EOF
    ++		rm $branch_dir_prefix/a/b/branch-bad &&
    ++		test_cmp expect err || return 1
    ++	done
     +'
     +
     +test_expect_success 'regular ref content should be checked (aggregate)' '
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref name check should work for mu
     +		git update-ref refs/worktree/branch-4 refs/heads/branch-1
     +	) &&
     +
    -+	bad_content_1=$(git rev-parse HEAD)x &&
    -+	bad_content_2=xfsazqfxcadas &&
    -+	bad_content_3=Xfsazqfxcadas &&
    -+
    -+	printf "%s" $bad_content_1 >$worktree1_refdir_prefix/bad-branch-1 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content_1
    -+	EOF
    -+	rm $worktree1_refdir_prefix/bad-branch-1 &&
    -+	test_cmp expect err &&
    -+
    -+	printf "%s" $bad_content_2 >$worktree2_refdir_prefix/bad-branch-2 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content_2
    -+	EOF
    -+	rm $worktree2_refdir_prefix/bad-branch-2 &&
    -+	test_cmp expect err &&
    -+
    -+	printf "%s" $bad_content_3 >$worktree1_refdir_prefix/bad-branch-3 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: worktrees/worktree-1/refs/worktree/bad-branch-3: badRefContent: $bad_content_3
    -+	EOF
    -+	rm $worktree1_refdir_prefix/bad-branch-3 &&
    -+	test_cmp expect err
    ++	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
    ++	do
    ++		printf "%s" $bad_content >$worktree1_refdir_prefix/bad-branch-1 &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content
    ++		EOF
    ++		rm $worktree1_refdir_prefix/bad-branch-1 &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
    ++	do
    ++		printf "%s" $bad_content >$worktree2_refdir_prefix/bad-branch-2 &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content
    ++		EOF
    ++		rm $worktree2_refdir_prefix/bad-branch-2 &&
    ++		test_cmp expect err || return 1
    ++	done
     +'
     +
      test_done
 6:  7eea024182 !  6:  945322fab7 ref: add more strict checks for regular refs
    @@ refs/refs-internal.h: struct ref_store {
     
      ## t/t0602-reffiles-fsck.sh ##
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be checked (individual)' '
    - 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
    - 	EOF
    - 	rm $branch_dir_prefix/a/b/branch-bad &&
    -+	test_cmp expect err &&
    + 		EOF
    + 		rm $branch_dir_prefix/a/b/branch-bad &&
    + 		test_cmp expect err || return 1
    +-	done
    ++	done &&
     +
     +	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
     +	git refs verify 2>err &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
     +	rm $branch_dir_prefix/branch-no-newline &&
     +	test_cmp expect err &&
     +
    -+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
    -+	git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
    -+	EOF
    -+	rm $branch_dir_prefix/branch-garbage &&
    -+	test_cmp expect err &&
    -+
    -+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
    ++	for trailing_content in " garbage" "    more garbage"
    ++	do
    ++		printf "%s" "$(git rev-parse main)$trailing_content" >$branch_dir_prefix/branch-garbage &&
    ++		git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\''$trailing_content'\''
    ++		EOF
    ++		rm $branch_dir_prefix/branch-garbage &&
    ++		test_cmp expect err || return 1
    ++	done &&
    ++
    ++	printf "%s\n\n\n" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-garbage-1: trailingRefContent: has trailing garbage: '\''
    ++	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
     +
     +
     +	'\''
     +	EOF
    -+	rm $tag_dir_prefix/tag-garbage-1 &&
    ++	rm $branch_dir_prefix/branch-garbage-special &&
     +	test_cmp expect err &&
     +
    -+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
    ++	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-garbage-2: trailingRefContent: has trailing garbage: '\''
    ++	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
     +
     +
     +	  garbage'\''
     +	EOF
    -+	rm $tag_dir_prefix/tag-garbage-2 &&
    -+	test_cmp expect err &&
    -+
    -+	printf "%s    garbage\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
    -+	git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	warning: refs/tags/tag-garbage-3: trailingRefContent: has trailing garbage: '\''    garbage
    -+	a'\''
    -+	EOF
    -+	rm $tag_dir_prefix/tag-garbage-3 &&
    -+	test_cmp expect err &&
    -+
    -+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
    -+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/tags/tag-garbage-4: trailingRefContent: has trailing garbage: '\'' garbage'\''
    -+	EOF
    -+	rm $tag_dir_prefix/tag-garbage-4 &&
    - 	test_cmp expect err
    ++	rm $branch_dir_prefix/branch-garbage-special &&
    ++	test_cmp expect err
      '
      
    + test_expect_success 'regular ref content should be checked (aggregate)' '
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be checked (aggregate)' '
      	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
      	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
      	sort err >sorted_err &&
      	test_cmp expect sorted_err
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref content checks should work with worktrees' '
    - 	error: worktrees/worktree-1/refs/worktree/bad-branch-3: badRefContent: $bad_content_3
    - 	EOF
    - 	rm $worktree1_refdir_prefix/bad-branch-3 &&
    -+	test_cmp expect err &&
    + 		EOF
    + 		rm $worktree2_refdir_prefix/bad-branch-2 &&
    + 		test_cmp expect err || return 1
    +-	done
    ++	done &&
     +
     +	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
     +	git refs verify 2>err &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'ref content checks should work wi
     +	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
     +	EOF
     +	rm $worktree1_refdir_prefix/branch-no-newline &&
    - 	test_cmp expect err
    ++	test_cmp expect err
      '
      
    + test_done
 7:  1bf36dd644 !  7:  3006eb9431 ref: add basic symref content check for files backend
    @@ fsck.h: enum fsck_msg_type {
     
      ## refs/files-backend.c ##
     @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
    - 				  const char *target_name,
    + 				  const char *refname,
      				  struct dir_iterator *iter);
      
     +static int files_fsck_symref_target(struct fsck_options *o,
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	branch_dir_prefix=.git/refs/heads &&
    -+	tag_dir_prefix=.git/refs/tags &&
     +	cd repo &&
     +	test_commit default &&
     +	mkdir -p "$branch_dir_prefix/a/b" &&
     +
    -+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
    -+	git refs verify 2>err &&
    -+	rm $branch_dir_prefix/branch-good &&
    -+	test_must_be_empty err &&
    ++	for good_referent in "refs/heads/branch" "HEAD"
    ++	do
    ++		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
    ++		git refs verify 2>err &&
    ++		rm $branch_dir_prefix/branch-good &&
    ++		test_must_be_empty err || return 1
    ++	done &&
     +
    -+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
    -+	git refs verify 2>err &&
    -+	rm $branch_dir_prefix/branch-head &&
    -+	test_must_be_empty err &&
    ++	for bad_referent in "refs/heads/.branch" "refs/heads/~branch" "refs/heads/?branch"
    ++	do
    ++		printf "ref: %s\n" $bad_referent >$branch_dir_prefix/branch-bad &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: refs/heads/branch-bad: badReferentName: points to invalid refname '\''$bad_referent'\''
    ++		EOF
    ++		rm $branch_dir_prefix/branch-bad &&
    ++		test_cmp expect err || return 1
    ++	done &&
     +
    -+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
    ++	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline &&
     +	git refs verify 2>err &&
     +	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
    ++	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
     +	EOF
    -+	rm $branch_dir_prefix/branch-no-newline-1 &&
    ++	rm $branch_dir_prefix/branch-no-newline &&
     +	test_cmp expect err &&
     +
     +	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'regular ref content should be che
     +	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
     +	EOF
     +	rm $branch_dir_prefix/a/b/branch-complicated &&
    -+	test_cmp expect err &&
    -+
    -+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
    -+	test_must_fail git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
    -+	EOF
    -+	rm $branch_dir_prefix/branch-bad-1 &&
     +	test_cmp expect err
     +'
     +
 8:  1d200f2ade !  8:  c59d003d78 ref: check whether the target of the symref is a ref
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'textual symref content should be
     +	test_commit default &&
     +	mkdir -p "$branch_dir_prefix/a/b" &&
     +
    -+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-good &&
    -+	git refs verify 2>err &&
    -+	rm $branch_dir_prefix/branch-good &&
    -+	test_must_be_empty err &&
    ++	for good_referent in "refs/heads/branch" "HEAD" "refs/tags/tag"
    ++	do
    ++		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
    ++		git refs verify 2>err &&
    ++		rm $branch_dir_prefix/branch-good &&
    ++		test_must_be_empty err || return 1
    ++	done &&
     +
    -+	printf "ref: refs/foo\n" >$branch_dir_prefix/branch-good &&
    -+	git refs verify 2>err &&
    -+	rm $branch_dir_prefix/branch-good &&
    -+	test_must_be_empty err &&
    -+
    -+	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
    -+	git refs verify 2>err &&
    -+	cat >expect <<-EOF &&
    -+	warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''refs-back/heads/main'\''
    -+	EOF
    -+	rm $branch_dir_prefix/branch-bad-1 &&
    -+	test_cmp expect err
    ++	for nonref_referent in "refs-back/heads/branch" "refs-back/tags/tag" "reflogs/refs/heads/branch"
    ++	do
    ++		printf "ref: %s\n" $nonref_referent >$branch_dir_prefix/branch-bad-1 &&
    ++		git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''$nonref_referent'\''
    ++		EOF
    ++		rm $branch_dir_prefix/branch-bad-1 &&
    ++		test_cmp expect err || return 1
    ++	done
     +'
     +
      test_expect_success 'ref content checks should work with worktrees' '
 9:  752f0ad22e !  9:  bb6d7f3323 ref: add symlink ref content check for files backend
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
     
      ## t/t0602-reffiles-fsck.sh ##
     @@ t/t0602-reffiles-fsck.sh: test_expect_success 'the target of the textual symref should be checked' '
    - 	test_cmp expect err
    + 	done
      '
      
     +test_expect_success SYMLINKS 'symlink symref content should be checked' '
-- 
2.47.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
@ 2024-11-10 12:09               ` shejialuo
  2024-11-10 12:09               ` [PATCH v7 2/9] ref: check the full refname instead of basename shejialuo
                                 ` (9 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:09 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0824c0b8a9..03d2503276 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,7 +3520,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 2/9] ref: check the full refname instead of basename
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
  2024-11-10 12:09               ` [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-11-10 12:09               ` shejialuo
  2024-11-10 12:09               ` [PATCH v7 3/9] ref: initialize ref name outside of check functions shejialuo
                                 ` (8 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:09 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "files-backend.c::files_fsck_refs_name", we validate the refname
format by using "check_refname_format" to check the basename of the
iterator with "REFNAME_ALLOW_ONELEVEL" flag.

However, this is a bad implementation. Although we doesn't allow a
single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
report an error wrongly when there is a "refs/heads/@" ref by using one
level refname "@".

Because we just check one level refname, we either cannot check the
other parts of the full refname. And we will ignore the following
errors:

  "refs/heads/ new-feature/test"
  "refs/heads/~new-feature/test"

In order to fix the above problem, enhance "files_fsck_refs_name" to use
the full name for "check_refname_format". Then, replace the tests which
are related to "@" and add tests to exercise the above situations using
for loop to avoid repetition.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     |  7 ++-
 t/t0602-reffiles-fsck.sh | 92 ++++++++++++++++++++++++----------------
 2 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 03d2503276..b055edc061 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3519,10 +3519,13 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+	/*
+	 * This works right now because we never check the root refs.
+	 */
+	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
+	if (check_refname_format(sb.buf, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..2a172c913d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -18,63 +18,81 @@ test_expect_success 'ref name should be checked' '
 	cd repo &&
 
 	git commit --allow-empty -m initial &&
-	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
-	git tag multi_hierarchy/tag-2 &&
+	git checkout -b default-branch &&
+	git tag default-tag &&
+	git tag multi_hierarchy/default-tag &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/.branch-1: badRefName: invalid refname format
-	EOF
-	rm $branch_dir_prefix/.branch-1 &&
-	test_cmp expect err &&
-
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/@: badRefName: invalid refname format
-	EOF
+	cp $branch_dir_prefix/default-branch $branch_dir_prefix/@ &&
+	git refs verify 2>err &&
+	test_must_be_empty err &&
 	rm $branch_dir_prefix/@ &&
-	test_cmp expect err &&
 
-	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
-	EOF
-	rm $tag_dir_prefix/multi_hierarchy/@ &&
-	test_cmp expect err &&
-
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/tag-1.lock &&
 	git refs verify 2>err &&
 	rm $tag_dir_prefix/tag-1.lock &&
 	test_must_be_empty err &&
 
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/.lock &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/tags/.lock: badRefName: invalid refname format
 	EOF
 	rm $tag_dir_prefix/.lock &&
-	test_cmp expect err
+	test_cmp expect err &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname: badRefName: invalid refname format
+		EOF
+		rm "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/default-tag "$tag_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/multi_hierarchy/default-tag "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/multi_hierarchy/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		mkdir "$branch_dir_prefix/$refname" &&
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname/default-branch" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname/default-branch: badRefName: invalid refname format
+		EOF
+		rm -r "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done
 '
 
 test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	branch_dir_prefix=.git/refs/heads &&
-	tag_dir_prefix=.git/refs/tags &&
 	cd repo &&
 	git commit --allow-empty -m initial &&
 	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
 
 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=warn refs verify 2>err &&
@@ -84,7 +102,7 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=ignore refs verify 2>err &&
 	test_must_be_empty err
 '
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 3/9] ref: initialize ref name outside of check functions
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
  2024-11-10 12:09               ` [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-11-10 12:09               ` [PATCH v7 2/9] ref: check the full refname instead of basename shejialuo
@ 2024-11-10 12:09               ` shejialuo
  2024-11-10 12:09               ` [PATCH v7 4/9] ref: support multiple worktrees check for refs shejialuo
                                 ` (7 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:09 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We passes "refs_check_dir" to the "files_fsck_refs_name" function which
allows it to create the checked ref name later. However, when we
introduce a new check function, we have to allocate redundant memory and
re-calculate the ref name. It's bad for us to allocate redundant memory
and duplicate logic. Instead, we should allocate and calculate it only
once and pass the ref name to the check functions.

In order not to do repeat calculation, rename "refs_check_dir" to
"refname". And in "files_fsck_refs_dir", create a new strbuf "refname",
thus whenever we handle a new ref, calculate the name and call the check
functions one by one.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index b055edc061..8edb700568 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3501,12 +3501,12 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store,
  */
 typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  struct fsck_options *o,
-				  const char *refs_check_dir,
+				  const char *refname,
 				  struct dir_iterator *iter);
 
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
-				const char *refs_check_dir,
+				const char *refname,
 				struct dir_iterator *iter)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -3522,11 +3522,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	/*
 	 * This works right now because we never check the root refs.
 	 */
-	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
-	if (check_refname_format(sb.buf, 0)) {
+	if (check_refname_format(refname, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		report.path = sb.buf;
+		report.path = refname;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
 				      "invalid refname format");
@@ -3542,6 +3541,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       const char *refs_check_dir,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
+	struct strbuf refname = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	struct dir_iterator *iter;
 	int iter_status;
@@ -3560,11 +3560,15 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			continue;
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
+			strbuf_reset(&refname);
+			strbuf_addf(&refname, "%s/%s", refs_check_dir,
+				    iter->relative_path);
+
 			if (o->verbose)
-				fprintf_ln(stderr, "Checking %s/%s",
-					   refs_check_dir, iter->relative_path);
+				fprintf_ln(stderr, "Checking %s", refname.buf);
+
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
-				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
+				if (fsck_refs_fn[i](ref_store, o, refname.buf, iter))
 					ret = -1;
 			}
 		} else {
@@ -3581,6 +3585,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 
 out:
 	strbuf_release(&sb);
+	strbuf_release(&refname);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 4/9] ref: support multiple worktrees check for refs
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (2 preceding siblings ...)
  2024-11-10 12:09               ` [PATCH v7 3/9] ref: initialize ref name outside of check functions shejialuo
@ 2024-11-10 12:09               ` shejialuo
  2024-11-10 12:09               ` [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
                                 ` (6 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:09 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.

Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.

The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.

Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.

Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c           | 10 ++++++--
 refs.c                   |  5 ++--
 refs.h                   |  3 ++-
 refs/debug.c             |  5 ++--
 refs/files-backend.c     | 17 ++++++++++----
 refs/packed-backend.c    |  8 ++++++-
 refs/refs-internal.h     |  3 ++-
 refs/reftable-backend.c  |  3 ++-
 t/t0602-reffiles-fsck.sh | 51 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 90 insertions(+), 15 deletions(-)

diff --git a/builtin/refs.c b/builtin/refs.c
index 24978a7b7b..394b4101c6 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "refs.h"
 #include "strbuf.h"
+#include "worktree.h"
 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
@@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 {
 	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	struct worktree **worktrees;
 	const char * const verify_usage[] = {
 		REFS_VERIFY_USAGE,
 		NULL,
@@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
 		OPT_END(),
 	};
-	int ret;
+	int ret = 0;
 
 	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
 	if (argc)
@@ -84,9 +86,13 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+	worktrees = get_worktrees();
+	for (size_t i = 0; worktrees[i]; i++)
+		ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
+				 &fsck_refs_options, worktrees[i]);
 
 	fsck_options_clear(&fsck_refs_options);
+	free_worktrees(worktrees);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 5f729ed412..395a17273c 100644
--- a/refs.c
+++ b/refs.c
@@ -318,9 +318,10 @@ int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
-int refs_fsck(struct ref_store *refs, struct fsck_options *o)
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt)
 {
-	return refs->be->fsck(refs, o);
+	return refs->be->fsck(refs, o, wt);
 }
 
 void sanitize_refname_component(const char *refname, struct strbuf *out)
diff --git a/refs.h b/refs.h
index 108dfc93b3..341d43239c 100644
--- a/refs.h
+++ b/refs.h
@@ -549,7 +549,8 @@ int check_refname_format(const char *refname, int flags);
  * reflogs are consistent, and non-zero otherwise. The errors will be
  * written to stderr.
  */
-int refs_fsck(struct ref_store *refs, struct fsck_options *o);
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt);
 
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
diff --git a/refs/debug.c b/refs/debug.c
index 45e2e784a0..72e80ddd6d 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -420,10 +420,11 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 static int debug_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->fsck(drefs->refs, o);
+	int res = drefs->refs->be->fsck(drefs->refs, o, wt);
 	trace_printf_key(&trace_refs, "fsck: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8edb700568..8bfdce64bc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -23,6 +23,7 @@
 #include "../dir.h"
 #include "../chdir-notify.h"
 #include "../setup.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../revision.h"
@@ -3539,6 +3540,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
+			       struct worktree *wt,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
 	struct strbuf refname = STRBUF_INIT;
@@ -3561,6 +3563,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
 			strbuf_reset(&refname);
+
+			if (!is_main_worktree(wt))
+				strbuf_addf(&refname, "worktrees/%s/", wt->id);
 			strbuf_addf(&refname, "%s/%s", refs_check_dir,
 				    iter->relative_path);
 
@@ -3590,7 +3595,8 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 }
 
 static int files_fsck_refs(struct ref_store *ref_store,
-			   struct fsck_options *o)
+			   struct fsck_options *o,
+			   struct worktree *wt)
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
@@ -3599,17 +3605,18 @@ static int files_fsck_refs(struct ref_store *ref_store,
 
 	if (o->verbose)
 		fprintf_ln(stderr, _("Checking references consistency"));
-	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
+	return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
 }
 
 static int files_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	return files_fsck_refs(ref_store, o) |
-	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+	return files_fsck_refs(ref_store, o, wt) |
+	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt);
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 07c57fd541..46dcaec654 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -13,6 +13,7 @@
 #include "../lockfile.h"
 #include "../chdir-notify.h"
 #include "../statinfo.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../trace2.h"
@@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 static int packed_fsck(struct ref_store *ref_store UNUSED,
-		       struct fsck_options *o UNUSED)
+		       struct fsck_options *o UNUSED,
+		       struct worktree *wt)
 {
+
+	if (!is_main_worktree(wt))
+		return 0;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..037d7991cd 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -653,7 +653,8 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 typedef int fsck_fn(struct ref_store *ref_store,
-		    struct fsck_options *o);
+		    struct fsck_options *o,
+		    struct worktree *wt);
 
 struct ref_storage_be {
 	const char *name;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index f5f957e6de..b6a63c1015 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2443,7 +2443,8 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 }
 
 static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED)
+			    struct fsck_options *o UNUSED,
+			    struct worktree *wt UNUSED)
 {
 	return 0;
 }
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 2a172c913d..1e17393a3d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -107,4 +107,55 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'ref name check should work for multiple worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	cd repo &&
+	test_commit initial &&
+	git checkout -b branch-1 &&
+	test_commit second &&
+	git checkout -b branch-2 &&
+	test_commit third &&
+	git checkout -b branch-3 &&
+	git worktree add ./worktree-1 branch-1 &&
+	git worktree add ./worktree-2 branch-2 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err &&
+
+	for worktree in "worktree-1" "worktree-2"
+	do
+		(
+			cd $worktree &&
+			test_must_fail git refs verify 2>err &&
+			cat >expect <<-EOF &&
+			error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+			error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+			EOF
+			sort err >sorted_err &&
+			test_cmp expect sorted_err || return 1
+		)
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (3 preceding siblings ...)
  2024-11-10 12:09               ` [PATCH v7 4/9] ref: support multiple worktrees check for refs shejialuo
@ 2024-11-10 12:09               ` shejialuo
  2024-11-13  7:36                 ` Patrick Steinhardt
  2024-11-10 12:10               ` [PATCH v7 6/9] ref: add more strict checks for regular refs shejialuo
                                 ` (5 subsequent siblings)
  10 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:09 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.

  error: refs/heads/main: invalid sha1 pointer 0000...

And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.

As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.

Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  43 ++++++++++++++
 t/t0602-reffiles-fsck.sh      | 105 ++++++++++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8bfdce64bc..2d126ecbbe 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *target_name,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	report.path = target_name;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "cannot read ref file '%s': %s",
+				      iter->path.buf, strerror(errno));
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refname,
@@ -3600,6 +3642,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 1e17393a3d..162370077b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -158,4 +158,109 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 	done
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/a/b/branch-bad &&
+		test_cmp expect err || return 1
+	done
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
+test_expect_success 'ref content checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree1_refdir_prefix/bad-branch-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content
+		EOF
+		rm $worktree1_refdir_prefix/bad-branch-1 &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree2_refdir_prefix/bad-branch-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content
+		EOF
+		rm $worktree2_refdir_prefix/bad-branch-2 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 6/9] ref: add more strict checks for regular refs
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (4 preceding siblings ...)
  2024-11-10 12:09               ` [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-10 12:10               ` shejialuo
  2024-11-10 12:10               ` [PATCH v7 7/9] ref: add basic symref content check for files backend shejialuo
                                 ` (4 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:10 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A loose ref that does not end with
   newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt | 14 +++++++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 26 ++++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 57 +++++++++++++++++++++++++++++++++--
 6 files changed, 96 insertions(+), 7 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..6db0eaa84a 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,20 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A loose ref that does not end with newline(LF). As
+	valid implementations of Git never created such a loose ref
+	file, it may become an error in the future. Report to the
+	git@vger.kernel.org mailing list if you see this error, as
+	we need to know what tools created such a file.
+
+`trailingRefContent`::
+	(INFO) A loose ref has trailing content. As valid implementations
+	of Git never created such a loose ref file, it may become an
+	error in the future. Report to the git@vger.kernel.org mailing
+	list if you see this error, as we need to know what tools
+	created such a file.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 395a17273c..f88b32a633 100644
--- a/refs.c
+++ b/refs.c
@@ -1789,7 +1789,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2d126ecbbe..871c8946f8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -569,7 +569,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -606,7 +606,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -628,6 +628,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3533,7 +3538,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		strbuf_rtrim(&ref_content);
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
@@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "misses LF at the end");
+			goto cleanup;
+		}
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "has trailing garbage: '%s'", trailing);
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 037d7991cd..125f1fe735 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -716,7 +716,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 162370077b..33e7a390ad 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -189,7 +189,48 @@ test_expect_success 'regular ref content should be checked (individual)' '
 		EOF
 		rm $branch_dir_prefix/a/b/branch-bad &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	for trailing_content in " garbage" "    more garbage"
+	do
+		printf "%s" "$(git rev-parse main)$trailing_content" >$branch_dir_prefix/branch-garbage &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\''$trailing_content'\''
+		EOF
+		rm $branch_dir_prefix/branch-garbage &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	  garbage'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err
 '
 
 test_expect_success 'regular ref content should be checked (aggregate)' '
@@ -207,12 +248,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
 	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
 	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
@@ -260,7 +305,15 @@ test_expect_success 'ref content checks should work with worktrees' '
 		EOF
 		rm $worktree2_refdir_prefix/bad-branch-2 &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err
 '
 
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 7/9] ref: add basic symref content check for files backend
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (5 preceding siblings ...)
  2024-11-10 12:10               ` [PATCH v7 6/9] ref: add more strict checks for regular refs shejialuo
@ 2024-11-10 12:10               ` shejialuo
  2024-11-10 12:10               ` [PATCH v7 8/9] ref: check whether the target of the symref is a ref shejialuo
                                 ` (3 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:10 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular ref
could be accepted as a textual symref if it begins with "ref:", followed
by zero or more whitespaces, followed by the full refname, followed only
by whitespace characters. However, we always write a single SP after
"ref:" and a single LF after the refname. It may seem that we should
report a fsck error message when the "referent" does not apply above
rules and we should not be so aggressive because third-party
reimplementations of Git may have taken advantage of the looser syntax.
Put it more specific, we accept the following contents:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created two fsck
infos "refMissingNewline" and "trailingRefContent" which exactly
represents above situations. So we will reuse these two fsck messages to
write checks to info the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors by using "is_root_ref" and "check_refname_format" to check
the "referent". Since both "is_root_ref" and "check_refname_format"
don't work with whitespaces, we use the trimmed version of "referent"
with these functions.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "is_root_ref" and "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  40 ++++++++++++
 t/t0602-reffiles-fsck.sh      | 111 ++++++++++++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 6db0eaa84a..dcea05edfc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..5227dfdef2 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 871c8946f8..8bc7c6e0c2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3509,6 +3509,43 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (!is_root_ref(referent->buf) &&
+	    check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *target_name,
@@ -3559,6 +3596,9 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 33e7a390ad..ee1e5f2864 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -263,6 +263,109 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for bad_referent in "refs/heads/.branch" "refs/heads/~branch" "refs/heads/?branch"
+	do
+		printf "ref: %s\n" $bad_referent >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badReferentName: points to invalid refname '\''$bad_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -313,6 +416,14 @@ test_expect_success 'ref content checks should work with worktrees' '
 	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-garbage &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 8/9] ref: check whether the target of the symref is a ref
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (6 preceding siblings ...)
  2024-11-10 12:10               ` [PATCH v7 7/9] ref: add basic symref content check for files backend shejialuo
@ 2024-11-10 12:10               ` shejialuo
  2024-11-10 12:10               ` [PATCH v7 9/9] ref: add symlink ref content check for files backend shejialuo
                                 ` (2 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:10 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpseudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the above situation, create
a new fsck message "symrefTargetIsNotARef" to notify the user that this
may become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 +++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 14 ++++++++++++--
 t/t0602-reffiles-fsck.sh      | 29 +++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index dcea05edfc..f82ebc58e8 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,15 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symrefTargetIsNotARef`::
+	(INFO) The target of a symbolic reference points neither to
+	a root reference nor to a reference starting with "refs/".
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `trailingRefContent`::
 	(INFO) A loose ref has trailing content. As valid implementations
 	of Git never created such a loose ref file, it may become an
diff --git a/fsck.h b/fsck.h
index 5227dfdef2..53a47612e6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8bc7c6e0c2..b3ec409920 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3513,6 +3513,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent)
 {
+	int is_referent_root;
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
@@ -3521,8 +3522,17 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
-	if (!is_root_ref(referent->buf) &&
-	    check_refname_format(referent->buf, 0)) {
+	is_referent_root = is_root_ref(referent->buf);
+	if (!is_referent_root &&
+	    !starts_with(referent->buf, "refs/") &&
+	    !starts_with(referent->buf, "worktrees/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF,
+				      "points to non-ref target '%s'", referent->buf);
+
+	}
+
+	if (!is_referent_root && check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to invalid refname '%s'", referent->buf);
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index ee1e5f2864..692b30727a 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -366,6 +366,35 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'the target of the textual symref should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD" "refs/tags/tag"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for nonref_referent in "refs-back/heads/branch" "refs-back/tags/tag" "reflogs/refs/heads/branch"
+	do
+		printf "ref: %s\n" $nonref_referent >$branch_dir_prefix/branch-bad-1 &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''$nonref_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad-1 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v7 9/9] ref: add symlink ref content check for files backend
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (7 preceding siblings ...)
  2024-11-10 12:10               ` [PATCH v7 8/9] ref: check whether the target of the symref is a ref shejialuo
@ 2024-11-10 12:10               ` shejialuo
  2024-11-13  7:36                 ` Patrick Steinhardt
  2024-11-13  7:36               ` [PATCH v7 0/9] add " Patrick Steinhardt
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
  10 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-11-10 12:10 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Besides the textual symref, we also allow symbolic links as the symref.
So, we should also provide the consistency check as what we have done
for textual symref. And also we consider deprecating writing the
symbolic links. We first need to access whether symbolic links still
be used. So, add a new fsck message "symlinkRef(INFO)" to tell the
user be aware of this information.

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

And we need to also generate the "referent" parameter for reusing
"files_fsck_symref_target" by the following steps:

1. Use "strbuf_add_real_path" to resolve the symlink and get the
   absolute path "ref_content" which the symlink ref points to.
2. Generate the absolute path "abs_gitdir" of "gitdir" and combine
   "ref_content" and "abs_gitdir" to extract the relative path
   "relative_referent_path".
3. If "ref_content" is outside of "gitdir", we just set "referent" with
   "ref_content". Instead, we set "referent" with
   "relative_referent_path".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  6 +++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 38 +++++++++++++++++++++++++----
 t/t0602-reffiles-fsck.sh      | 45 +++++++++++++++++++++++++++++++++++
 4 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index f82ebc58e8..b14bc44ca4 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,12 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symlinkRef`::
+	(INFO) A symbolic link is used as a symref. Report to the
+	git@vger.kernel.org mailing list if you see this error, as we
+	are assessing the feasibility of dropping the support to drop
+	creating symbolic links as symrefs.
+
 `symrefTargetIsNotARef`::
 	(INFO) The target of a symbolic reference points neither to
 	a root reference nor to a reference starting with "refs/".
diff --git a/fsck.h b/fsck.h
index 53a47612e6..a44c231a5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -86,6 +86,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
 	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b3ec409920..37c669a30f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../config.h"
 #include "../copy.h"
 #include "../environment.h"
@@ -3511,7 +3512,8 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    struct strbuf *referent)
+				    struct strbuf *referent,
+				    unsigned int symbolic_link)
 {
 	int is_referent_root;
 	char orig_last_byte;
@@ -3520,7 +3522,8 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	orig_len = referent->len;
 	orig_last_byte = referent->buf[orig_len - 1];
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
 
 	is_referent_root = is_root_ref(referent->buf);
 	if (!is_referent_root &&
@@ -3539,6 +3542,9 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
+	if (symbolic_link)
+		goto out;
+
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
 		ret = fsck_report_ref(o, report,
@@ -3562,6 +3568,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
 	const char *trailing = NULL;
@@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	report.path = target_name;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path = NULL;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&ref_content, iter->path.buf);
+		skip_prefix(ref_content.buf, abs_gitdir.buf,
+			    &relative_referent_path);
+
+		if (relative_referent_path)
+			strbuf_addstr(&referent, relative_referent_path);
+		else
+			strbuf_addbuf(&referent, &ref_content);
+
+		ret |= files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = fsck_report_ref(o, &report,
@@ -3607,13 +3636,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 			goto cleanup;
 		}
 	} else {
-		ret = files_fsck_symref_target(o, &report, &referent);
+		ret = files_fsck_symref_target(o, &report, &referent, 0);
 		goto cleanup;
 	}
 
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 692b30727a..0d5eda6d22 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -395,6 +395,51 @@ test_expect_success 'the target of the textual symref should be checked' '
 	done
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   " $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferentName: points to invalid refname '\''refs/heads/branch   '\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v7 0/9] add ref content check for files backend
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (8 preceding siblings ...)
  2024-11-10 12:10               ` [PATCH v7 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-11-13  7:36               ` Patrick Steinhardt
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
  10 siblings, 0 replies; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-13  7:36 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Nov 10, 2024 at 08:07:36PM +0800, shejialuo wrote:
> Hi All:
> 
> This new version solves the follow problems:
> 
> 1. Enhance the commit message suggested by Patrick.
> 2. Rename "target_name" to "refname".
> 3. Enhance the shell scripts to use `for in` to avoid repetition. And
> this is the main change of this new version.
> 
> Thanks,
> Jialuo

I've got two more comments, but otherwise this series looks close now.
Thanks!

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-10 12:09               ` [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-13  7:36                 ` Patrick Steinhardt
  2024-11-14 12:09                   ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-13  7:36 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Nov 10, 2024 at 08:09:51PM +0800, shejialuo wrote:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8bfdce64bc..2d126ecbbe 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refname,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *target_name,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct fsck_ref_report report = { 0 };
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	report.path = target_name;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "cannot read ref file '%s': %s",
> +				      iter->path.buf, strerror(errno));
> +		goto cleanup;
> +	}

I didn't catch this in previous rounds, but it's a little dubious
whether we should report this as an actual fsck error. I can expect
multiple situations:

  - The file has weird permissions and thus cannot be read, failing with
    EPERM, which doesn't match well with BAD_REF_CONTENT.

  - The file does not exist anymore because we were racing with a
    concurrent writer, failing with ENOENT. This is benign and expected
    to happen in busy repos, so generating an error here feels wrong.

  - The file cannot be read at all due to an I/O error. This may be
    reported with BAD_REF_CONTENT, but conflating this with the case
    where we have actually bad content may not be the best idea.

So maybe we should ignore ENOENT, report bad permissions and otherwise
return an actual error to the caller?

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v7 9/9] ref: add symlink ref content check for files backend
  2024-11-10 12:10               ` [PATCH v7 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-11-13  7:36                 ` Patrick Steinhardt
  2024-11-14 12:18                   ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-13  7:36 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Sun, Nov 10, 2024 at 08:10:27PM +0800, shejialuo wrote:
> @@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  
>  	report.path = target_name;
>  
> -	if (S_ISLNK(iter->st.st_mode))
> +	if (S_ISLNK(iter->st.st_mode)) {
> +		const char* relative_referent_path = NULL;

Nit: the asterisk should stick with the variable name.

> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_SYMLINK_REF,
> +				      "use deprecated symbolic link for symref");
> +
> +		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
> +		strbuf_normalize_path(&abs_gitdir);
> +		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
> +			strbuf_addch(&abs_gitdir, '/');
> +
> +		strbuf_add_real_path(&ref_content, iter->path.buf);
> +		skip_prefix(ref_content.buf, abs_gitdir.buf,
> +			    &relative_referent_path);
> +
> +		if (relative_referent_path)
> +			strbuf_addstr(&referent, relative_referent_path);
> +		else
> +			strbuf_addbuf(&referent, &ref_content);
> +
> +		ret |= files_fsck_symref_target(o, &report, &referent, 1);
>  		goto cleanup;
> +	}

I wonder whether this logic works as expected with per-worktree symbolic
refs which are a symlink. On the other hand I wonder whether those work
as expected in the first place. Probably not. *shrug*

In any case, it would be nice to have a test for this.

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-13  7:36                 ` Patrick Steinhardt
@ 2024-11-14 12:09                   ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 12:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Nov 13, 2024 at 08:36:12AM +0100, Patrick Steinhardt wrote:
> On Sun, Nov 10, 2024 at 08:09:51PM +0800, shejialuo wrote:
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 8bfdce64bc..2d126ecbbe 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
> >  				  const char *refname,
> >  				  struct dir_iterator *iter);
> >  
> > +static int files_fsck_refs_content(struct ref_store *ref_store,
> > +				   struct fsck_options *o,
> > +				   const char *target_name,
> > +				   struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> > +	struct strbuf referent = STRBUF_INIT;
> > +	struct fsck_ref_report report = { 0 };
> > +	unsigned int type = 0;
> > +	int failure_errno = 0;
> > +	struct object_id oid;
> > +	int ret = 0;
> > +
> > +	report.path = target_name;
> > +
> > +	if (S_ISLNK(iter->st.st_mode))
> > +		goto cleanup;
> > +
> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> > +		ret = fsck_report_ref(o, &report,
> > +				      FSCK_MSG_BAD_REF_CONTENT,
> > +				      "cannot read ref file '%s': %s",
> > +				      iter->path.buf, strerror(errno));
> > +		goto cleanup;
> > +	}
> 
> I didn't catch this in previous rounds, but it's a little dubious
> whether we should report this as an actual fsck error. I can expect
> multiple situations:
> 
>   - The file has weird permissions and thus cannot be read, failing with
>     EPERM, which doesn't match well with BAD_REF_CONTENT.
> 
>   - The file does not exist anymore because we were racing with a
>     concurrent writer, failing with ENOENT. This is benign and expected
>     to happen in busy repos, so generating an error here feels wrong.
> 
>   - The file cannot be read at all due to an I/O error. This may be
>     reported with BAD_REF_CONTENT, but conflating this with the case
>     where we have actually bad content may not be the best idea.
> 
> So maybe we should ignore ENOENT, report bad permissions and otherwise
> return an actual error to the caller?
> 

So, I think we should just use "error_errno" method to report the actual
error to the caller. And we also need to add some comments.

Thanks for this wonderful suggestion.


> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v7 9/9] ref: add symlink ref content check for files backend
  2024-11-13  7:36                 ` Patrick Steinhardt
@ 2024-11-14 12:18                   ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 12:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Nov 13, 2024 at 08:36:16AM +0100, Patrick Steinhardt wrote:
> On Sun, Nov 10, 2024 at 08:10:27PM +0800, shejialuo wrote:
> > @@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
> >  
> >  	report.path = target_name;
> >  
> > -	if (S_ISLNK(iter->st.st_mode))
> > +	if (S_ISLNK(iter->st.st_mode)) {
> > +		const char* relative_referent_path = NULL;
> 
> Nit: the asterisk should stick with the variable name.
> 

I will improve this in the next version.

> > +		ret = fsck_report_ref(o, &report,
> > +				      FSCK_MSG_SYMLINK_REF,
> > +				      "use deprecated symbolic link for symref");
> > +
> > +		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
> > +		strbuf_normalize_path(&abs_gitdir);
> > +		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
> > +			strbuf_addch(&abs_gitdir, '/');
> > +
> > +		strbuf_add_real_path(&ref_content, iter->path.buf);
> > +		skip_prefix(ref_content.buf, abs_gitdir.buf,
> > +			    &relative_referent_path);
> > +
> > +		if (relative_referent_path)
> > +			strbuf_addstr(&referent, relative_referent_path);
> > +		else
> > +			strbuf_addbuf(&referent, &ref_content);
> > +
> > +		ret |= files_fsck_symref_target(o, &report, &referent, 1);
> >  		goto cleanup;
> > +	}
> 
> I wonder whether this logic works as expected with per-worktree symbolic
> refs which are a symlink. On the other hand I wonder whether those work
> as expected in the first place. Probably not. *shrug*
> 
> In any case, it would be nice to have a test for this.
> 

Correct, I have ignored because I add worktree support in the later
version. Let me add a new test to verify this.

> Patrick

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v8 0/9] add ref content check for files backend
  2024-11-10 12:07             ` [PATCH v7 " shejialuo
                                 ` (9 preceding siblings ...)
  2024-11-13  7:36               ` [PATCH v7 0/9] add " Patrick Steinhardt
@ 2024-11-14 16:51               ` shejialuo
  2024-11-14 16:53                 ` [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
                                   ` (10 more replies)
  10 siblings, 11 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi all:

This new version solves the following problem:

1. when reading the content of the ref file, we do not use
"fsck_report_ref" function. It's not suitable.
2. Add a new test for symlink worktree test in the last patch. After
writing the tets, find a bug. Fix the bug described below.

Because we have introduced the check for worktrees, we should not use
"ref_store->gitdir", instead we need to use "ref_store->repo->gitdir" to
get the main worktree "gitdir". After fixing this, the test is passed.

Thank Patrick to remind me about this. I forgot to add test thus making
mistakes.

Thanks,
Jialuo

shejialuo (9):
  ref: initialize "fsck_ref_report" with zero
  ref: check the full refname instead of basename
  ref: initialize ref name outside of check functions
  ref: support multiple worktrees check for refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add basic symref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  35 +++
 builtin/refs.c                |  10 +-
 fsck.h                        |   6 +
 refs.c                        |   7 +-
 refs.h                        |   3 +-
 refs/debug.c                  |   5 +-
 refs/files-backend.c          | 195 +++++++++++-
 refs/packed-backend.c         |   8 +-
 refs/refs-internal.h          |   5 +-
 refs/reftable-backend.c       |   3 +-
 t/t0602-reffiles-fsck.sh      | 576 ++++++++++++++++++++++++++++++++--
 11 files changed, 791 insertions(+), 62 deletions(-)

Range-diff against v7:
 1:  bfb2a21af4 =  1:  bfb2a21af4 ref: initialize "fsck_ref_report" with zero
 2:  9efc83f7ea =  2:  9efc83f7ea ref: check the full refname instead of basename
 3:  5ea7d18203 =  3:  5ea7d18203 ref: initialize ref name outside of check functions
 4:  cb4669b64d =  4:  cb4669b64d ref: support multiple worktrees check for refs
 5:  4e1add6465 !  5:  c6c128c922 ref: port git-fsck(1) regular refs check for files backend
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +	if (S_ISLNK(iter->st.st_mode))
     +		goto cleanup;
     +
    -+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
    -+		ret = fsck_report_ref(o, &report,
    -+				      FSCK_MSG_BAD_REF_CONTENT,
    -+				      "cannot read ref file '%s': %s",
    -+				      iter->path.buf, strerror(errno));
    ++	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
    ++		/*
    ++		 * Ref file could be removed by another concurrent process. We should
    ++		 * ignore this error and continue to the next ref.
    ++		 */
    ++		if (errno == ENOENT)
    ++			goto cleanup;
    ++
    ++		ret = error_errno(_("cannot read ref file '%s': %s"),
    ++				  iter->path.buf, strerror(errno));
     +		goto cleanup;
     +	}
     +
 6:  945322fab7 =  6:  911fa42717 ref: add more strict checks for regular refs
 7:  3006eb9431 =  7:  7aa6a99206 ref: add basic symref content check for files backend
 8:  c59d003d78 =  8:  dbb0787ad1 ref: check whether the target of the symref is a ref
 9:  bb6d7f3323 !  9:  a6d85b4864 ref: add symlink ref content check for files backend
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
      
     -	if (S_ISLNK(iter->st.st_mode))
     +	if (S_ISLNK(iter->st.st_mode)) {
    -+		const char* relative_referent_path = NULL;
    ++		const char *relative_referent_path = NULL;
     +
     +		ret = fsck_report_ref(o, &report,
     +				      FSCK_MSG_SYMLINK_REF,
     +				      "use deprecated symbolic link for symref");
     +
    -+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
    ++		strbuf_add_absolute_path(&abs_gitdir, ref_store->repo->gitdir);
     +		strbuf_normalize_path(&abs_gitdir);
     +		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
     +			strbuf_addch(&abs_gitdir, '/');
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
      		goto cleanup;
     +	}
      
    - 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
    - 		ret = fsck_report_ref(o, &report,
    + 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
    + 		/*
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
      			goto cleanup;
      		}
    @@ t/t0602-reffiles-fsck.sh: test_expect_success 'the target of the textual symref
     +	rm $tag_dir_prefix/tag-symbolic-1 &&
     +	test_cmp expect err
     +'
    ++
    ++test_expect_success SYMLINKS 'symlink symref content should be checked (worktree)' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	cd repo &&
    ++	test_commit default &&
    ++	git branch branch-1 &&
    ++	git branch branch-2 &&
    ++	git branch branch-3 &&
    ++	git worktree add ./worktree-1 branch-2 &&
    ++	git worktree add ./worktree-2 branch-3 &&
    ++	main_worktree_refdir_prefix=.git/refs/heads &&
    ++	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
    ++	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
    ++
    ++	(
    ++		cd worktree-1 &&
    ++		git update-ref refs/worktree/branch-4 refs/heads/branch-1
    ++	) &&
    ++	(
    ++		cd worktree-2 &&
    ++		git update-ref refs/worktree/branch-4 refs/heads/branch-1
    ++	) &&
    ++
    ++	ln -sf ../../../../refs/heads/good-branch $worktree1_refdir_prefix/branch-symbolic-good &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: worktrees/worktree-1/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
    ++	EOF
    ++	rm $worktree1_refdir_prefix/branch-symbolic-good &&
    ++	test_cmp expect err &&
    ++
    ++	ln -sf ../../../../worktrees/worktree-1/good-branch $worktree2_refdir_prefix/branch-symbolic-good &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: worktrees/worktree-2/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
    ++	EOF
    ++	rm $worktree2_refdir_prefix/branch-symbolic-good &&
    ++	test_cmp expect err &&
    ++
    ++	ln -sf ../../worktrees/worktree-2/good-branch $main_worktree_refdir_prefix/branch-symbolic-good &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
    ++	EOF
    ++	rm $main_worktree_refdir_prefix/branch-symbolic-good &&
    ++	test_cmp expect err &&
    ++
    ++	ln -sf ../../../../logs/branch-escape $worktree1_refdir_prefix/branch-symbolic &&
    ++	git refs verify 2>err &&
    ++	cat >expect <<-EOF &&
    ++	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
    ++	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
    ++	EOF
    ++	rm $worktree1_refdir_prefix/branch-symbolic &&
    ++	test_cmp expect err &&
    ++
    ++	for bad_referent_name in ".tag" "branch   "
    ++	do
    ++		ln -sf ./"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
    ++		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-1/refs/worktree/$bad_referent_name'\''
    ++		EOF
    ++		rm $worktree1_refdir_prefix/bad-symbolic &&
    ++		test_cmp expect err &&
    ++
    ++		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
    ++		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
    ++		EOF
    ++		rm $worktree1_refdir_prefix/bad-symbolic &&
    ++		test_cmp expect err &&
    ++
    ++		ln -sf ./"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
    ++		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-2/refs/worktree/$bad_referent_name'\''
    ++		EOF
    ++		rm $worktree2_refdir_prefix/bad-symbolic &&
    ++		test_cmp expect err &&
    ++
    ++		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
    ++		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
    ++		EOF
    ++		rm $worktree2_refdir_prefix/bad-symbolic &&
    ++		test_cmp expect err || return 1
    ++	done
    ++'
     +
      test_expect_success 'ref content checks should work with worktrees' '
      	test_when_finished "rm -rf repo" &&
-- 
2.47.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
@ 2024-11-14 16:53                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 2/9] ref: check the full refname instead of basename shejialuo
                                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:53 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0824c0b8a9..03d2503276 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,7 +3520,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 2/9] ref: check the full refname instead of basename
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
  2024-11-14 16:53                 ` [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 3/9] ref: initialize ref name outside of check functions shejialuo
                                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "files-backend.c::files_fsck_refs_name", we validate the refname
format by using "check_refname_format" to check the basename of the
iterator with "REFNAME_ALLOW_ONELEVEL" flag.

However, this is a bad implementation. Although we doesn't allow a
single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
report an error wrongly when there is a "refs/heads/@" ref by using one
level refname "@".

Because we just check one level refname, we either cannot check the
other parts of the full refname. And we will ignore the following
errors:

  "refs/heads/ new-feature/test"
  "refs/heads/~new-feature/test"

In order to fix the above problem, enhance "files_fsck_refs_name" to use
the full name for "check_refname_format". Then, replace the tests which
are related to "@" and add tests to exercise the above situations using
for loop to avoid repetition.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     |  7 ++-
 t/t0602-reffiles-fsck.sh | 92 ++++++++++++++++++++++++----------------
 2 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 03d2503276..b055edc061 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3519,10 +3519,13 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+	/*
+	 * This works right now because we never check the root refs.
+	 */
+	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
+	if (check_refname_format(sb.buf, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..2a172c913d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -18,63 +18,81 @@ test_expect_success 'ref name should be checked' '
 	cd repo &&
 
 	git commit --allow-empty -m initial &&
-	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
-	git tag multi_hierarchy/tag-2 &&
+	git checkout -b default-branch &&
+	git tag default-tag &&
+	git tag multi_hierarchy/default-tag &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/.branch-1: badRefName: invalid refname format
-	EOF
-	rm $branch_dir_prefix/.branch-1 &&
-	test_cmp expect err &&
-
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/@: badRefName: invalid refname format
-	EOF
+	cp $branch_dir_prefix/default-branch $branch_dir_prefix/@ &&
+	git refs verify 2>err &&
+	test_must_be_empty err &&
 	rm $branch_dir_prefix/@ &&
-	test_cmp expect err &&
 
-	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
-	EOF
-	rm $tag_dir_prefix/multi_hierarchy/@ &&
-	test_cmp expect err &&
-
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/tag-1.lock &&
 	git refs verify 2>err &&
 	rm $tag_dir_prefix/tag-1.lock &&
 	test_must_be_empty err &&
 
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/.lock &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/tags/.lock: badRefName: invalid refname format
 	EOF
 	rm $tag_dir_prefix/.lock &&
-	test_cmp expect err
+	test_cmp expect err &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname: badRefName: invalid refname format
+		EOF
+		rm "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/default-tag "$tag_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/multi_hierarchy/default-tag "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/multi_hierarchy/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		mkdir "$branch_dir_prefix/$refname" &&
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname/default-branch" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname/default-branch: badRefName: invalid refname format
+		EOF
+		rm -r "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done
 '
 
 test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	branch_dir_prefix=.git/refs/heads &&
-	tag_dir_prefix=.git/refs/tags &&
 	cd repo &&
 	git commit --allow-empty -m initial &&
 	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
 
 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=warn refs verify 2>err &&
@@ -84,7 +102,7 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=ignore refs verify 2>err &&
 	test_must_be_empty err
 '
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 3/9] ref: initialize ref name outside of check functions
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
  2024-11-14 16:53                 ` [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-11-14 16:54                 ` [PATCH v8 2/9] ref: check the full refname instead of basename shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 4/9] ref: support multiple worktrees check for refs shejialuo
                                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We passes "refs_check_dir" to the "files_fsck_refs_name" function which
allows it to create the checked ref name later. However, when we
introduce a new check function, we have to allocate redundant memory and
re-calculate the ref name. It's bad for us to allocate redundant memory
and duplicate logic. Instead, we should allocate and calculate it only
once and pass the ref name to the check functions.

In order not to do repeat calculation, rename "refs_check_dir" to
"refname". And in "files_fsck_refs_dir", create a new strbuf "refname",
thus whenever we handle a new ref, calculate the name and call the check
functions one by one.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index b055edc061..8edb700568 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3501,12 +3501,12 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store,
  */
 typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  struct fsck_options *o,
-				  const char *refs_check_dir,
+				  const char *refname,
 				  struct dir_iterator *iter);
 
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
-				const char *refs_check_dir,
+				const char *refname,
 				struct dir_iterator *iter)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -3522,11 +3522,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	/*
 	 * This works right now because we never check the root refs.
 	 */
-	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
-	if (check_refname_format(sb.buf, 0)) {
+	if (check_refname_format(refname, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		report.path = sb.buf;
+		report.path = refname;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
 				      "invalid refname format");
@@ -3542,6 +3541,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       const char *refs_check_dir,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
+	struct strbuf refname = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	struct dir_iterator *iter;
 	int iter_status;
@@ -3560,11 +3560,15 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			continue;
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
+			strbuf_reset(&refname);
+			strbuf_addf(&refname, "%s/%s", refs_check_dir,
+				    iter->relative_path);
+
 			if (o->verbose)
-				fprintf_ln(stderr, "Checking %s/%s",
-					   refs_check_dir, iter->relative_path);
+				fprintf_ln(stderr, "Checking %s", refname.buf);
+
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
-				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
+				if (fsck_refs_fn[i](ref_store, o, refname.buf, iter))
 					ret = -1;
 			}
 		} else {
@@ -3581,6 +3585,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 
 out:
 	strbuf_release(&sb);
+	strbuf_release(&refname);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 4/9] ref: support multiple worktrees check for refs
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (2 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 3/9] ref: initialize ref name outside of check functions shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
                                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.

Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.

The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.

Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.

Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c           | 10 ++++++--
 refs.c                   |  5 ++--
 refs.h                   |  3 ++-
 refs/debug.c             |  5 ++--
 refs/files-backend.c     | 17 ++++++++++----
 refs/packed-backend.c    |  8 ++++++-
 refs/refs-internal.h     |  3 ++-
 refs/reftable-backend.c  |  3 ++-
 t/t0602-reffiles-fsck.sh | 51 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 90 insertions(+), 15 deletions(-)

diff --git a/builtin/refs.c b/builtin/refs.c
index 24978a7b7b..394b4101c6 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "refs.h"
 #include "strbuf.h"
+#include "worktree.h"
 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
@@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 {
 	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	struct worktree **worktrees;
 	const char * const verify_usage[] = {
 		REFS_VERIFY_USAGE,
 		NULL,
@@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
 		OPT_END(),
 	};
-	int ret;
+	int ret = 0;
 
 	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
 	if (argc)
@@ -84,9 +86,13 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+	worktrees = get_worktrees();
+	for (size_t i = 0; worktrees[i]; i++)
+		ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
+				 &fsck_refs_options, worktrees[i]);
 
 	fsck_options_clear(&fsck_refs_options);
+	free_worktrees(worktrees);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 5f729ed412..395a17273c 100644
--- a/refs.c
+++ b/refs.c
@@ -318,9 +318,10 @@ int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
-int refs_fsck(struct ref_store *refs, struct fsck_options *o)
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt)
 {
-	return refs->be->fsck(refs, o);
+	return refs->be->fsck(refs, o, wt);
 }
 
 void sanitize_refname_component(const char *refname, struct strbuf *out)
diff --git a/refs.h b/refs.h
index 108dfc93b3..341d43239c 100644
--- a/refs.h
+++ b/refs.h
@@ -549,7 +549,8 @@ int check_refname_format(const char *refname, int flags);
  * reflogs are consistent, and non-zero otherwise. The errors will be
  * written to stderr.
  */
-int refs_fsck(struct ref_store *refs, struct fsck_options *o);
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt);
 
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
diff --git a/refs/debug.c b/refs/debug.c
index 45e2e784a0..72e80ddd6d 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -420,10 +420,11 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 static int debug_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->fsck(drefs->refs, o);
+	int res = drefs->refs->be->fsck(drefs->refs, o, wt);
 	trace_printf_key(&trace_refs, "fsck: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8edb700568..8bfdce64bc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -23,6 +23,7 @@
 #include "../dir.h"
 #include "../chdir-notify.h"
 #include "../setup.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../revision.h"
@@ -3539,6 +3540,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
+			       struct worktree *wt,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
 	struct strbuf refname = STRBUF_INIT;
@@ -3561,6 +3563,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
 			strbuf_reset(&refname);
+
+			if (!is_main_worktree(wt))
+				strbuf_addf(&refname, "worktrees/%s/", wt->id);
 			strbuf_addf(&refname, "%s/%s", refs_check_dir,
 				    iter->relative_path);
 
@@ -3590,7 +3595,8 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 }
 
 static int files_fsck_refs(struct ref_store *ref_store,
-			   struct fsck_options *o)
+			   struct fsck_options *o,
+			   struct worktree *wt)
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
@@ -3599,17 +3605,18 @@ static int files_fsck_refs(struct ref_store *ref_store,
 
 	if (o->verbose)
 		fprintf_ln(stderr, _("Checking references consistency"));
-	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
+	return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
 }
 
 static int files_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	return files_fsck_refs(ref_store, o) |
-	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+	return files_fsck_refs(ref_store, o, wt) |
+	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt);
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 07c57fd541..46dcaec654 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -13,6 +13,7 @@
 #include "../lockfile.h"
 #include "../chdir-notify.h"
 #include "../statinfo.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../trace2.h"
@@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 static int packed_fsck(struct ref_store *ref_store UNUSED,
-		       struct fsck_options *o UNUSED)
+		       struct fsck_options *o UNUSED,
+		       struct worktree *wt)
 {
+
+	if (!is_main_worktree(wt))
+		return 0;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..037d7991cd 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -653,7 +653,8 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 typedef int fsck_fn(struct ref_store *ref_store,
-		    struct fsck_options *o);
+		    struct fsck_options *o,
+		    struct worktree *wt);
 
 struct ref_storage_be {
 	const char *name;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index f5f957e6de..b6a63c1015 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2443,7 +2443,8 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 }
 
 static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED)
+			    struct fsck_options *o UNUSED,
+			    struct worktree *wt UNUSED)
 {
 	return 0;
 }
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 2a172c913d..1e17393a3d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -107,4 +107,55 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'ref name check should work for multiple worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	cd repo &&
+	test_commit initial &&
+	git checkout -b branch-1 &&
+	test_commit second &&
+	git checkout -b branch-2 &&
+	test_commit third &&
+	git checkout -b branch-3 &&
+	git worktree add ./worktree-1 branch-1 &&
+	git worktree add ./worktree-2 branch-2 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err &&
+
+	for worktree in "worktree-1" "worktree-2"
+	do
+		(
+			cd $worktree &&
+			test_must_fail git refs verify 2>err &&
+			cat >expect <<-EOF &&
+			error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+			error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+			EOF
+			sort err >sorted_err &&
+			test_cmp expect sorted_err || return 1
+		)
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (3 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 4/9] ref: support multiple worktrees check for refs shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-15  7:11                   ` Patrick Steinhardt
  2024-11-14 16:54                 ` [PATCH v8 6/9] ref: add more strict checks for regular refs shejialuo
                                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.

  error: refs/heads/main: invalid sha1 pointer 0000...

And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.

As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.

Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  48 ++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 105 ++++++++++++++++++++++++++++++++++
 4 files changed, 157 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8bfdce64bc..f81b4c8dd5 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3505,6 +3505,53 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *target_name,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	report.path = target_name;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
+		/*
+		 * Ref file could be removed by another concurrent process. We should
+		 * ignore this error and continue to the next ref.
+		 */
+		if (errno == ENOENT)
+			goto cleanup;
+
+		ret = error_errno(_("cannot read ref file '%s': %s"),
+				  iter->path.buf, strerror(errno));
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refname,
@@ -3600,6 +3647,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 1e17393a3d..162370077b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -158,4 +158,109 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 	done
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/a/b/branch-bad &&
+		test_cmp expect err || return 1
+	done
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
+test_expect_success 'ref content checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree1_refdir_prefix/bad-branch-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content
+		EOF
+		rm $worktree1_refdir_prefix/bad-branch-1 &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree2_refdir_prefix/bad-branch-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content
+		EOF
+		rm $worktree2_refdir_prefix/bad-branch-2 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 6/9] ref: add more strict checks for regular refs
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (4 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 7/9] ref: add basic symref content check for files backend shejialuo
                                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A loose ref that does not end with
   newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt | 14 +++++++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 26 ++++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 57 +++++++++++++++++++++++++++++++++--
 6 files changed, 96 insertions(+), 7 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..6db0eaa84a 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,20 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A loose ref that does not end with newline(LF). As
+	valid implementations of Git never created such a loose ref
+	file, it may become an error in the future. Report to the
+	git@vger.kernel.org mailing list if you see this error, as
+	we need to know what tools created such a file.
+
+`trailingRefContent`::
+	(INFO) A loose ref has trailing content. As valid implementations
+	of Git never created such a loose ref file, it may become an
+	error in the future. Report to the git@vger.kernel.org mailing
+	list if you see this error, as we need to know what tools
+	created such a file.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 395a17273c..f88b32a633 100644
--- a/refs.c
+++ b/refs.c
@@ -1789,7 +1789,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index f81b4c8dd5..a325b102b8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -569,7 +569,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -606,7 +606,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -628,6 +628,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3538,7 +3543,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		strbuf_rtrim(&ref_content);
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
@@ -3546,6 +3551,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "misses LF at the end");
+			goto cleanup;
+		}
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "has trailing garbage: '%s'", trailing);
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 037d7991cd..125f1fe735 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -716,7 +716,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 162370077b..33e7a390ad 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -189,7 +189,48 @@ test_expect_success 'regular ref content should be checked (individual)' '
 		EOF
 		rm $branch_dir_prefix/a/b/branch-bad &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	for trailing_content in " garbage" "    more garbage"
+	do
+		printf "%s" "$(git rev-parse main)$trailing_content" >$branch_dir_prefix/branch-garbage &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\''$trailing_content'\''
+		EOF
+		rm $branch_dir_prefix/branch-garbage &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	  garbage'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err
 '
 
 test_expect_success 'regular ref content should be checked (aggregate)' '
@@ -207,12 +248,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
 	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
 	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
@@ -260,7 +305,15 @@ test_expect_success 'ref content checks should work with worktrees' '
 		EOF
 		rm $worktree2_refdir_prefix/bad-branch-2 &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err
 '
 
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 7/9] ref: add basic symref content check for files backend
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (5 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 6/9] ref: add more strict checks for regular refs shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:54                 ` [PATCH v8 8/9] ref: check whether the target of the symref is a ref shejialuo
                                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular ref
could be accepted as a textual symref if it begins with "ref:", followed
by zero or more whitespaces, followed by the full refname, followed only
by whitespace characters. However, we always write a single SP after
"ref:" and a single LF after the refname. It may seem that we should
report a fsck error message when the "referent" does not apply above
rules and we should not be so aggressive because third-party
reimplementations of Git may have taken advantage of the looser syntax.
Put it more specific, we accept the following contents:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created two fsck
infos "refMissingNewline" and "trailingRefContent" which exactly
represents above situations. So we will reuse these two fsck messages to
write checks to info the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors by using "is_root_ref" and "check_refname_format" to check
the "referent". Since both "is_root_ref" and "check_refname_format"
don't work with whitespaces, we use the trimmed version of "referent"
with these functions.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "is_root_ref" and "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  40 ++++++++++++
 t/t0602-reffiles-fsck.sh      | 111 ++++++++++++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 6db0eaa84a..dcea05edfc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..5227dfdef2 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index a325b102b8..c496006db1 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3509,6 +3509,43 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (!is_root_ref(referent->buf) &&
+	    check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *target_name,
@@ -3564,6 +3601,9 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 33e7a390ad..ee1e5f2864 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -263,6 +263,109 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for bad_referent in "refs/heads/.branch" "refs/heads/~branch" "refs/heads/?branch"
+	do
+		printf "ref: %s\n" $bad_referent >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badReferentName: points to invalid refname '\''$bad_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -313,6 +416,14 @@ test_expect_success 'ref content checks should work with worktrees' '
 	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-garbage &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 8/9] ref: check whether the target of the symref is a ref
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (6 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 7/9] ref: add basic symref content check for files backend shejialuo
@ 2024-11-14 16:54                 ` shejialuo
  2024-11-14 16:55                 ` [PATCH v8 9/9] ref: add symlink ref content check for files backend shejialuo
                                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:54 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpseudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the above situation, create
a new fsck message "symrefTargetIsNotARef" to notify the user that this
may become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 +++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 14 ++++++++++++--
 t/t0602-reffiles-fsck.sh      | 29 +++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index dcea05edfc..f82ebc58e8 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,15 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symrefTargetIsNotARef`::
+	(INFO) The target of a symbolic reference points neither to
+	a root reference nor to a reference starting with "refs/".
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `trailingRefContent`::
 	(INFO) A loose ref has trailing content. As valid implementations
 	of Git never created such a loose ref file, it may become an
diff --git a/fsck.h b/fsck.h
index 5227dfdef2..53a47612e6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index c496006db1..edf73d6cce 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3513,6 +3513,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent)
 {
+	int is_referent_root;
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
@@ -3521,8 +3522,17 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
-	if (!is_root_ref(referent->buf) &&
-	    check_refname_format(referent->buf, 0)) {
+	is_referent_root = is_root_ref(referent->buf);
+	if (!is_referent_root &&
+	    !starts_with(referent->buf, "refs/") &&
+	    !starts_with(referent->buf, "worktrees/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF,
+				      "points to non-ref target '%s'", referent->buf);
+
+	}
+
+	if (!is_referent_root && check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to invalid refname '%s'", referent->buf);
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index ee1e5f2864..692b30727a 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -366,6 +366,35 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'the target of the textual symref should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD" "refs/tags/tag"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for nonref_referent in "refs-back/heads/branch" "refs-back/tags/tag" "reflogs/refs/heads/branch"
+	do
+		printf "ref: %s\n" $nonref_referent >$branch_dir_prefix/branch-bad-1 &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''$nonref_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad-1 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v8 9/9] ref: add symlink ref content check for files backend
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (7 preceding siblings ...)
  2024-11-14 16:54                 ` [PATCH v8 8/9] ref: check whether the target of the symref is a ref shejialuo
@ 2024-11-14 16:55                 ` shejialuo
  2024-11-15 11:10                 ` [PATCH v8 0/9] add " shejialuo
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-14 16:55 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Besides the textual symref, we also allow symbolic links as the symref.
So, we should also provide the consistency check as what we have done
for textual symref. And also we consider deprecating writing the
symbolic links. We first need to access whether symbolic links still
be used. So, add a new fsck message "symlinkRef(INFO)" to tell the
user be aware of this information.

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

And we need to also generate the "referent" parameter for reusing
"files_fsck_symref_target" by the following steps:

1. Use "strbuf_add_real_path" to resolve the symlink and get the
   absolute path "ref_content" which the symlink ref points to.
2. Generate the absolute path "abs_gitdir" of "gitdir" and combine
   "ref_content" and "abs_gitdir" to extract the relative path
   "relative_referent_path".
3. If "ref_content" is outside of "gitdir", we just set "referent" with
   "ref_content". Instead, we set "referent" with
   "relative_referent_path".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   6 ++
 fsck.h                        |   1 +
 refs/files-backend.c          |  38 ++++++++-
 t/t0602-reffiles-fsck.sh      | 141 ++++++++++++++++++++++++++++++++++
 4 files changed, 182 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index f82ebc58e8..b14bc44ca4 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,12 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symlinkRef`::
+	(INFO) A symbolic link is used as a symref. Report to the
+	git@vger.kernel.org mailing list if you see this error, as we
+	are assessing the feasibility of dropping the support to drop
+	creating symbolic links as symrefs.
+
 `symrefTargetIsNotARef`::
 	(INFO) The target of a symbolic reference points neither to
 	a root reference nor to a reference starting with "refs/".
diff --git a/fsck.h b/fsck.h
index 53a47612e6..a44c231a5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -86,6 +86,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
 	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index edf73d6cce..c715e411f3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../config.h"
 #include "../copy.h"
 #include "../environment.h"
@@ -3511,7 +3512,8 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    struct strbuf *referent)
+				    struct strbuf *referent,
+				    unsigned int symbolic_link)
 {
 	int is_referent_root;
 	char orig_last_byte;
@@ -3520,7 +3522,8 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	orig_len = referent->len;
 	orig_last_byte = referent->buf[orig_len - 1];
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
 
 	is_referent_root = is_root_ref(referent->buf);
 	if (!is_referent_root &&
@@ -3539,6 +3542,9 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
+	if (symbolic_link)
+		goto out;
+
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
 		ret = fsck_report_ref(o, report,
@@ -3562,6 +3568,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
 	const char *trailing = NULL;
@@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	report.path = target_name;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char *relative_referent_path = NULL;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->repo->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&ref_content, iter->path.buf);
+		skip_prefix(ref_content.buf, abs_gitdir.buf,
+			    &relative_referent_path);
+
+		if (relative_referent_path)
+			strbuf_addstr(&referent, relative_referent_path);
+		else
+			strbuf_addbuf(&referent, &ref_content);
+
+		ret |= files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
 		/*
@@ -3612,13 +3641,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 			goto cleanup;
 		}
 	} else {
-		ret = files_fsck_symref_target(o, &report, &referent);
+		ret = files_fsck_symref_target(o, &report, &referent, 0);
 		goto cleanup;
 	}
 
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 692b30727a..f8f27cfc6c 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -395,6 +395,147 @@ test_expect_success 'the target of the textual symref should be checked' '
 	done
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   " $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferentName: points to invalid refname '\''refs/heads/branch   '\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
+'
+
+test_expect_success SYMLINKS 'symlink symref content should be checked (worktree)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	main_worktree_refdir_prefix=.git/refs/heads &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	ln -sf ../../../../refs/heads/good-branch $worktree1_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $worktree1_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../worktrees/worktree-1/good-branch $worktree2_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-2/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $worktree2_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../worktrees/worktree-2/good-branch $main_worktree_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $main_worktree_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../logs/branch-escape $worktree1_refdir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	for bad_referent_name in ".tag" "branch   "
+	do
+		ln -sf ./"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-1/refs/worktree/$bad_referent_name'\''
+		EOF
+		rm $worktree1_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
+		EOF
+		rm $worktree1_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ./"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-2/refs/worktree/$bad_referent_name'\''
+		EOF
+		rm $worktree2_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
+		EOF
+		rm $worktree2_refdir_prefix/bad-symbolic &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-14 16:54                 ` [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-15  7:11                   ` Patrick Steinhardt
  2024-11-15 11:08                     ` shejialuo
  0 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-15  7:11 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Fri, Nov 15, 2024 at 12:54:28AM +0800, shejialuo wrote:
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {

Nit: there's a space too much here now.

> +		/*
> +		 * Ref file could be removed by another concurrent process. We should
> +		 * ignore this error and continue to the next ref.
> +		 */
> +		if (errno == ENOENT)
> +			goto cleanup;
> +
> +		ret = error_errno(_("cannot read ref file '%s': %s"),
> +				  iter->path.buf, strerror(errno));
> +		goto cleanup;
> +	}

You report `errno` twice. This should be:

	ret = error_errno(_("cannot read ref file '%s'"), iter->path.buf);

Other than that this version looks good to me, thanks!

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-15  7:11                   ` Patrick Steinhardt
@ 2024-11-15 11:08                     ` shejialuo
  0 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-15 11:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano

On Fri, Nov 15, 2024 at 08:11:01AM +0100, Patrick Steinhardt wrote:
> On Fri, Nov 15, 2024 at 12:54:28AM +0800, shejialuo wrote:
> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
> 
> Nit: there's a space too much here now.
> 

I will improve this in the next version.

> > +		/*
> > +		 * Ref file could be removed by another concurrent process. We should
> > +		 * ignore this error and continue to the next ref.
> > +		 */
> > +		if (errno == ENOENT)
> > +			goto cleanup;
> > +
> > +		ret = error_errno(_("cannot read ref file '%s': %s"),
> > +				  iter->path.buf, strerror(errno));
> > +		goto cleanup;
> > +	}
> 
> You report `errno` twice. This should be:
> 
> 	ret = error_errno(_("cannot read ref file '%s'"), iter->path.buf);
> 
> Other than that this version looks good to me, thanks!
> 

Opps, I didn't think about it, I just copied it. I will fix this in the
next version.

> Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v8 0/9] add ref content check for files backend
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (8 preceding siblings ...)
  2024-11-14 16:55                 ` [PATCH v8 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-11-15 11:10                 ` shejialuo
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
  10 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-15 11:10 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

On Fri, Nov 15, 2024 at 12:51:49AM +0800, shejialuo wrote:
> Hi all:
> 
> This new version solves the following problem:
> 
> 1. when reading the content of the ref file, we do not use
> "fsck_report_ref" function. It's not suitable.
> 2. Add a new test for symlink worktree test in the last patch. After
> writing the tets, find a bug. Fix the bug described below.
> 
> Because we have introduced the check for worktrees, we should not use
> "ref_store->gitdir", instead we need to use "ref_store->repo->gitdir" to
> get the main worktree "gitdir". After fixing this, the test is passed.
> 
> Thank Patrick to remind me about this. I forgot to add test thus making
> mistakes.
> 
> Thanks,
> Jialuo

I'd like to wait for couple of days for more reviews and comments from
Junio and Karthik.


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v9 0/9] add ref content check for files backend
  2024-11-14 16:51               ` [PATCH v8 " shejialuo
                                   ` (9 preceding siblings ...)
  2024-11-15 11:10                 ` [PATCH v8 0/9] add " shejialuo
@ 2024-11-20 11:47                 ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
                                     ` (9 more replies)
  10 siblings, 10 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:47 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Hi All:

This version fixes two problems:

1. Remove unnecessary space.
2. Drop extra "strerror(errno)".

Thanks,
Jialuo

shejialuo (9):
  ref: initialize "fsck_ref_report" with zero
  ref: check the full refname instead of basename
  ref: initialize ref name outside of check functions
  ref: support multiple worktrees check for refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add basic symref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  35 +++
 builtin/refs.c                |  10 +-
 fsck.h                        |   6 +
 refs.c                        |   7 +-
 refs.h                        |   3 +-
 refs/debug.c                  |   5 +-
 refs/files-backend.c          | 194 +++++++++++-
 refs/packed-backend.c         |   8 +-
 refs/refs-internal.h          |   5 +-
 refs/reftable-backend.c       |   3 +-
 t/t0602-reffiles-fsck.sh      | 576 ++++++++++++++++++++++++++++++++--
 11 files changed, 790 insertions(+), 62 deletions(-)

Range-diff against v8:
 1:  bfb2a21af4 =  1:  bfb2a21af4 ref: initialize "fsck_ref_report" with zero
 2:  9efc83f7ea =  2:  9efc83f7ea ref: check the full refname instead of basename
 3:  5ea7d18203 =  3:  5ea7d18203 ref: initialize ref name outside of check functions
 4:  cb4669b64d =  4:  cb4669b64d ref: support multiple worktrees check for refs
 5:  c6c128c922 !  5:  d6188063d9 ref: port git-fsck(1) regular refs check for files backend
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +	if (S_ISLNK(iter->st.st_mode))
     +		goto cleanup;
     +
    -+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
    ++	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
     +		/*
     +		 * Ref file could be removed by another concurrent process. We should
     +		 * ignore this error and continue to the next ref.
    @@ refs/files-backend.c: typedef int (*files_fsck_refs_fn)(struct ref_store *ref_st
     +		if (errno == ENOENT)
     +			goto cleanup;
     +
    -+		ret = error_errno(_("cannot read ref file '%s': %s"),
    -+				  iter->path.buf, strerror(errno));
    ++		ret = error_errno(_("cannot read ref file '%s'"), iter->path.buf);
     +		goto cleanup;
     +	}
     +
 6:  911fa42717 =  6:  e5e97ba3ad ref: add more strict checks for regular refs
 7:  7aa6a99206 =  7:  1dec0a56d2 ref: add basic symref content check for files backend
 8:  dbb0787ad1 =  8:  dcc4a02102 ref: check whether the target of the symref is a ref
 9:  a6d85b4864 !  9:  fc10862f6f ref: add symlink ref content check for files backend
    @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_s
      		goto cleanup;
     +	}
      
    - 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0 ) {
    + 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
      		/*
     @@ refs/files-backend.c: static int files_fsck_refs_content(struct ref_store *ref_store,
      			goto cleanup;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 209+ messages in thread

* [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 2/9] ref: check the full refname instead of basename shejialuo
                                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0824c0b8a9..03d2503276 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,7 +3520,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 2/9] ref: check the full refname instead of basename
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
  2024-11-20 11:51                   ` [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 3/9] ref: initialize ref name outside of check functions shejialuo
                                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

In "files-backend.c::files_fsck_refs_name", we validate the refname
format by using "check_refname_format" to check the basename of the
iterator with "REFNAME_ALLOW_ONELEVEL" flag.

However, this is a bad implementation. Although we doesn't allow a
single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
report an error wrongly when there is a "refs/heads/@" ref by using one
level refname "@".

Because we just check one level refname, we either cannot check the
other parts of the full refname. And we will ignore the following
errors:

  "refs/heads/ new-feature/test"
  "refs/heads/~new-feature/test"

In order to fix the above problem, enhance "files_fsck_refs_name" to use
the full name for "check_refname_format". Then, replace the tests which
are related to "@" and add tests to exercise the above situations using
for loop to avoid repetition.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c     |  7 ++-
 t/t0602-reffiles-fsck.sh | 92 ++++++++++++++++++++++++----------------
 2 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 03d2503276..b055edc061 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3519,10 +3519,13 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock"))
 		goto cleanup;
 
-	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+	/*
+	 * This works right now because we never check the root refs.
+	 */
+	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
+	if (check_refname_format(sb.buf, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..2a172c913d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -18,63 +18,81 @@ test_expect_success 'ref name should be checked' '
 	cd repo &&
 
 	git commit --allow-empty -m initial &&
-	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
-	git tag multi_hierarchy/tag-2 &&
+	git checkout -b default-branch &&
+	git tag default-tag &&
+	git tag multi_hierarchy/default-tag &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/.branch-1: badRefName: invalid refname format
-	EOF
-	rm $branch_dir_prefix/.branch-1 &&
-	test_cmp expect err &&
-
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/heads/@: badRefName: invalid refname format
-	EOF
+	cp $branch_dir_prefix/default-branch $branch_dir_prefix/@ &&
+	git refs verify 2>err &&
+	test_must_be_empty err &&
 	rm $branch_dir_prefix/@ &&
-	test_cmp expect err &&
 
-	cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
-	test_must_fail git refs verify 2>err &&
-	cat >expect <<-EOF &&
-	error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
-	EOF
-	rm $tag_dir_prefix/multi_hierarchy/@ &&
-	test_cmp expect err &&
-
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/tag-1.lock &&
 	git refs verify 2>err &&
 	rm $tag_dir_prefix/tag-1.lock &&
 	test_must_be_empty err &&
 
-	cp $tag_dir_prefix/tag-1 $tag_dir_prefix/.lock &&
+	cp $tag_dir_prefix/default-tag $tag_dir_prefix/.lock &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/tags/.lock: badRefName: invalid refname format
 	EOF
 	rm $tag_dir_prefix/.lock &&
-	test_cmp expect err
+	test_cmp expect err &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname: badRefName: invalid refname format
+		EOF
+		rm "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/default-tag "$tag_dir_prefix/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		cp $tag_dir_prefix/multi_hierarchy/default-tag "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/multi_hierarchy/$refname: badRefName: invalid refname format
+		EOF
+		rm "$tag_dir_prefix/multi_hierarchy/$refname" &&
+		test_cmp expect err || return 1
+	done &&
+
+	for refname in ".refname-starts-with-dot" "~refname-has-stride"
+	do
+		mkdir "$branch_dir_prefix/$refname" &&
+		cp $branch_dir_prefix/default-branch "$branch_dir_prefix/$refname/default-branch" &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/$refname/default-branch: badRefName: invalid refname format
+		EOF
+		rm -r "$branch_dir_prefix/$refname" &&
+		test_cmp expect err || return 1
+	done
 '
 
 test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	branch_dir_prefix=.git/refs/heads &&
-	tag_dir_prefix=.git/refs/tags &&
 	cd repo &&
 	git commit --allow-empty -m initial &&
 	git checkout -b branch-1 &&
-	git tag tag-1 &&
-	git commit --allow-empty -m second &&
-	git checkout -b branch-2 &&
-	git tag tag-2 &&
 
 	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=warn refs verify 2>err &&
@@ -84,7 +102,7 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	rm $branch_dir_prefix/.branch-1 &&
 	test_cmp expect err &&
 
-	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+	cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
 	git -c fsck.badRefName=ignore refs verify 2>err &&
 	test_must_be_empty err
 '
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 3/9] ref: initialize ref name outside of check functions
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
  2024-11-20 11:51                   ` [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
  2024-11-20 11:51                   ` [PATCH v9 2/9] ref: check the full refname instead of basename shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 4/9] ref: support multiple worktrees check for refs shejialuo
                                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We passes "refs_check_dir" to the "files_fsck_refs_name" function which
allows it to create the checked ref name later. However, when we
introduce a new check function, we have to allocate redundant memory and
re-calculate the ref name. It's bad for us to allocate redundant memory
and duplicate logic. Instead, we should allocate and calculate it only
once and pass the ref name to the check functions.

In order not to do repeat calculation, rename "refs_check_dir" to
"refname". And in "files_fsck_refs_dir", create a new strbuf "refname",
thus whenever we handle a new ref, calculate the name and call the check
functions one by one.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index b055edc061..8edb700568 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3501,12 +3501,12 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store,
  */
 typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  struct fsck_options *o,
-				  const char *refs_check_dir,
+				  const char *refname,
 				  struct dir_iterator *iter);
 
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
-				const char *refs_check_dir,
+				const char *refname,
 				struct dir_iterator *iter)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -3522,11 +3522,10 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 	/*
 	 * This works right now because we never check the root refs.
 	 */
-	strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
-	if (check_refname_format(sb.buf, 0)) {
+	if (check_refname_format(refname, 0)) {
 		struct fsck_ref_report report = { 0 };
 
-		report.path = sb.buf;
+		report.path = refname;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_NAME,
 				      "invalid refname format");
@@ -3542,6 +3541,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       const char *refs_check_dir,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
+	struct strbuf refname = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	struct dir_iterator *iter;
 	int iter_status;
@@ -3560,11 +3560,15 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 			continue;
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
+			strbuf_reset(&refname);
+			strbuf_addf(&refname, "%s/%s", refs_check_dir,
+				    iter->relative_path);
+
 			if (o->verbose)
-				fprintf_ln(stderr, "Checking %s/%s",
-					   refs_check_dir, iter->relative_path);
+				fprintf_ln(stderr, "Checking %s", refname.buf);
+
 			for (size_t i = 0; fsck_refs_fn[i]; i++) {
-				if (fsck_refs_fn[i](ref_store, o, refs_check_dir, iter))
+				if (fsck_refs_fn[i](ref_store, o, refname.buf, iter))
 					ret = -1;
 			}
 		} else {
@@ -3581,6 +3585,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 
 out:
 	strbuf_release(&sb);
+	strbuf_release(&refname);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 4/9] ref: support multiple worktrees check for refs
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (2 preceding siblings ...)
  2024-11-20 11:51                   ` [PATCH v9 3/9] ref: initialize ref name outside of check functions shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
                                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.

Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.

The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.

Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.

Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c           | 10 ++++++--
 refs.c                   |  5 ++--
 refs.h                   |  3 ++-
 refs/debug.c             |  5 ++--
 refs/files-backend.c     | 17 ++++++++++----
 refs/packed-backend.c    |  8 ++++++-
 refs/refs-internal.h     |  3 ++-
 refs/reftable-backend.c  |  3 ++-
 t/t0602-reffiles-fsck.sh | 51 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 90 insertions(+), 15 deletions(-)

diff --git a/builtin/refs.c b/builtin/refs.c
index 24978a7b7b..394b4101c6 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "refs.h"
 #include "strbuf.h"
+#include "worktree.h"
 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
@@ -66,6 +67,7 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 {
 	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	struct worktree **worktrees;
 	const char * const verify_usage[] = {
 		REFS_VERIFY_USAGE,
 		NULL,
@@ -75,7 +77,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
 		OPT_END(),
 	};
-	int ret;
+	int ret = 0;
 
 	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
 	if (argc)
@@ -84,9 +86,13 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+	worktrees = get_worktrees();
+	for (size_t i = 0; worktrees[i]; i++)
+		ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
+				 &fsck_refs_options, worktrees[i]);
 
 	fsck_options_clear(&fsck_refs_options);
+	free_worktrees(worktrees);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 5f729ed412..395a17273c 100644
--- a/refs.c
+++ b/refs.c
@@ -318,9 +318,10 @@ int check_refname_format(const char *refname, int flags)
 	return check_or_sanitize_refname(refname, flags, NULL);
 }
 
-int refs_fsck(struct ref_store *refs, struct fsck_options *o)
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt)
 {
-	return refs->be->fsck(refs, o);
+	return refs->be->fsck(refs, o, wt);
 }
 
 void sanitize_refname_component(const char *refname, struct strbuf *out)
diff --git a/refs.h b/refs.h
index 108dfc93b3..341d43239c 100644
--- a/refs.h
+++ b/refs.h
@@ -549,7 +549,8 @@ int check_refname_format(const char *refname, int flags);
  * reflogs are consistent, and non-zero otherwise. The errors will be
  * written to stderr.
  */
-int refs_fsck(struct ref_store *refs, struct fsck_options *o);
+int refs_fsck(struct ref_store *refs, struct fsck_options *o,
+	      struct worktree *wt);
 
 /*
  * Apply the rules from check_refname_format, but mutate the result until it
diff --git a/refs/debug.c b/refs/debug.c
index 45e2e784a0..72e80ddd6d 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -420,10 +420,11 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 static int debug_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->fsck(drefs->refs, o);
+	int res = drefs->refs->be->fsck(drefs->refs, o, wt);
 	trace_printf_key(&trace_refs, "fsck: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8edb700568..8bfdce64bc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -23,6 +23,7 @@
 #include "../dir.h"
 #include "../chdir-notify.h"
 #include "../setup.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../revision.h"
@@ -3539,6 +3540,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
+			       struct worktree *wt,
 			       files_fsck_refs_fn *fsck_refs_fn)
 {
 	struct strbuf refname = STRBUF_INIT;
@@ -3561,6 +3563,9 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		} else if (S_ISREG(iter->st.st_mode) ||
 			   S_ISLNK(iter->st.st_mode)) {
 			strbuf_reset(&refname);
+
+			if (!is_main_worktree(wt))
+				strbuf_addf(&refname, "worktrees/%s/", wt->id);
 			strbuf_addf(&refname, "%s/%s", refs_check_dir,
 				    iter->relative_path);
 
@@ -3590,7 +3595,8 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 }
 
 static int files_fsck_refs(struct ref_store *ref_store,
-			   struct fsck_options *o)
+			   struct fsck_options *o,
+			   struct worktree *wt)
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
@@ -3599,17 +3605,18 @@ static int files_fsck_refs(struct ref_store *ref_store,
 
 	if (o->verbose)
 		fprintf_ln(stderr, _("Checking references consistency"));
-	return files_fsck_refs_dir(ref_store, o,  "refs", fsck_refs_fn);
+	return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
 }
 
 static int files_fsck(struct ref_store *ref_store,
-		      struct fsck_options *o)
+		      struct fsck_options *o,
+		      struct worktree *wt)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "fsck");
 
-	return files_fsck_refs(ref_store, o) |
-	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o);
+	return files_fsck_refs(ref_store, o, wt) |
+	       refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt);
 }
 
 struct ref_storage_be refs_be_files = {
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 07c57fd541..46dcaec654 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -13,6 +13,7 @@
 #include "../lockfile.h"
 #include "../chdir-notify.h"
 #include "../statinfo.h"
+#include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
 #include "../trace2.h"
@@ -1754,8 +1755,13 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 static int packed_fsck(struct ref_store *ref_store UNUSED,
-		       struct fsck_options *o UNUSED)
+		       struct fsck_options *o UNUSED,
+		       struct worktree *wt)
 {
+
+	if (!is_main_worktree(wt))
+		return 0;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..037d7991cd 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -653,7 +653,8 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 typedef int fsck_fn(struct ref_store *ref_store,
-		    struct fsck_options *o);
+		    struct fsck_options *o,
+		    struct worktree *wt);
 
 struct ref_storage_be {
 	const char *name;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index f5f957e6de..b6a63c1015 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2443,7 +2443,8 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 }
 
 static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED)
+			    struct fsck_options *o UNUSED,
+			    struct worktree *wt UNUSED)
 {
 	return 0;
 }
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 2a172c913d..1e17393a3d 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -107,4 +107,55 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'ref name check should work for multiple worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	cd repo &&
+	test_commit initial &&
+	git checkout -b branch-1 &&
+	test_commit second &&
+	git checkout -b branch-2 &&
+	test_commit third &&
+	git checkout -b branch-3 &&
+	git worktree add ./worktree-1 branch-1 &&
+	git worktree add ./worktree-2 branch-2 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-3
+	) &&
+
+	cp $worktree1_refdir_prefix/branch-4 $worktree1_refdir_prefix/'\'' branch-5'\'' &&
+	cp $worktree2_refdir_prefix/branch-4 $worktree2_refdir_prefix/'\''~branch-6'\'' &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+	error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err &&
+
+	for worktree in "worktree-1" "worktree-2"
+	do
+		(
+			cd $worktree &&
+			test_must_fail git refs verify 2>err &&
+			cat >expect <<-EOF &&
+			error: worktrees/worktree-1/refs/worktree/ branch-5: badRefName: invalid refname format
+			error: worktrees/worktree-2/refs/worktree/~branch-6: badRefName: invalid refname format
+			EOF
+			sort err >sorted_err &&
+			test_cmp expect sorted_err || return 1
+		)
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 5/9] ref: port git-fsck(1) regular refs check for files backend
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (3 preceding siblings ...)
  2024-11-20 11:51                   ` [PATCH v9 4/9] ref: support multiple worktrees check for refs shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:51                   ` [PATCH v9 6/9] ref: add more strict checks for regular refs shejialuo
                                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.

  error: refs/heads/main: invalid sha1 pointer 0000...

And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.

As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.

Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  47 +++++++++++++++
 t/t0602-reffiles-fsck.sh      | 105 ++++++++++++++++++++++++++++++++++
 4 files changed, 156 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8bfdce64bc..9f300a7d3c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3505,6 +3505,52 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *target_name,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	report.path = target_name;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		/*
+		 * Ref file could be removed by another concurrent process. We should
+		 * ignore this error and continue to the next ref.
+		 */
+		if (errno == ENOENT)
+			goto cleanup;
+
+		ret = error_errno(_("cannot read ref file '%s'"), iter->path.buf);
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refname,
@@ -3600,6 +3646,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 1e17393a3d..162370077b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -158,4 +158,109 @@ test_expect_success 'ref name check should work for multiple worktrees' '
 	done
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/a/b/branch-bad &&
+		test_cmp expect err || return 1
+	done
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
+test_expect_success 'ref content checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree1_refdir_prefix/bad-branch-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content
+		EOF
+		rm $worktree1_refdir_prefix/bad-branch-1 &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree2_refdir_prefix/bad-branch-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content
+		EOF
+		rm $worktree2_refdir_prefix/bad-branch-2 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 6/9] ref: add more strict checks for regular refs
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (4 preceding siblings ...)
  2024-11-20 11:51                   ` [PATCH v9 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
@ 2024-11-20 11:51                   ` shejialuo
  2024-11-20 11:52                   ` [PATCH v9 7/9] ref: add basic symref content check for files backend shejialuo
                                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:51 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A loose ref that does not end with
   newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt | 14 +++++++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 26 ++++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 57 +++++++++++++++++++++++++++++++++--
 6 files changed, 96 insertions(+), 7 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..6db0eaa84a 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,20 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A loose ref that does not end with newline(LF). As
+	valid implementations of Git never created such a loose ref
+	file, it may become an error in the future. Report to the
+	git@vger.kernel.org mailing list if you see this error, as
+	we need to know what tools created such a file.
+
+`trailingRefContent`::
+	(INFO) A loose ref has trailing content. As valid implementations
+	of Git never created such a loose ref file, it may become an
+	error in the future. Report to the git@vger.kernel.org mailing
+	list if you see this error, as we need to know what tools
+	created such a file.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 395a17273c..f88b32a633 100644
--- a/refs.c
+++ b/refs.c
@@ -1789,7 +1789,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9f300a7d3c..3d4d612420 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -569,7 +569,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -606,7 +606,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -628,6 +628,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3537,7 +3542,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		strbuf_rtrim(&ref_content);
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
@@ -3545,6 +3550,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "misses LF at the end");
+			goto cleanup;
+		}
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "has trailing garbage: '%s'", trailing);
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 037d7991cd..125f1fe735 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -716,7 +716,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 162370077b..33e7a390ad 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -189,7 +189,48 @@ test_expect_success 'regular ref content should be checked (individual)' '
 		EOF
 		rm $branch_dir_prefix/a/b/branch-bad &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	for trailing_content in " garbage" "    more garbage"
+	do
+		printf "%s" "$(git rev-parse main)$trailing_content" >$branch_dir_prefix/branch-garbage &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\''$trailing_content'\''
+		EOF
+		rm $branch_dir_prefix/branch-garbage &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage-special &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage-special: trailingRefContent: has trailing garbage: '\''
+
+
+	  garbage'\''
+	EOF
+	rm $branch_dir_prefix/branch-garbage-special &&
+	test_cmp expect err
 '
 
 test_expect_success 'regular ref content should be checked (aggregate)' '
@@ -207,12 +248,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
 	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
 	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
 	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
 	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	warning: refs/heads/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
@@ -260,7 +305,15 @@ test_expect_success 'ref content checks should work with worktrees' '
 		EOF
 		rm $worktree2_refdir_prefix/bad-branch-2 &&
 		test_cmp expect err || return 1
-	done
+	done &&
+
+	printf "%s" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err
 '
 
 test_done
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 7/9] ref: add basic symref content check for files backend
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (5 preceding siblings ...)
  2024-11-20 11:51                   ` [PATCH v9 6/9] ref: add more strict checks for regular refs shejialuo
@ 2024-11-20 11:52                   ` shejialuo
  2024-11-20 11:52                   ` [PATCH v9 8/9] ref: check whether the target of the symref is a ref shejialuo
                                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:52 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular ref
could be accepted as a textual symref if it begins with "ref:", followed
by zero or more whitespaces, followed by the full refname, followed only
by whitespace characters. However, we always write a single SP after
"ref:" and a single LF after the refname. It may seem that we should
report a fsck error message when the "referent" does not apply above
rules and we should not be so aggressive because third-party
reimplementations of Git may have taken advantage of the looser syntax.
Put it more specific, we accept the following contents:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created two fsck
infos "refMissingNewline" and "trailingRefContent" which exactly
represents above situations. So we will reuse these two fsck messages to
write checks to info the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors by using "is_root_ref" and "check_refname_format" to check
the "referent". Since both "is_root_ref" and "check_refname_format"
don't work with whitespaces, we use the trimmed version of "referent"
with these functions.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "is_root_ref" and "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  40 ++++++++++++
 t/t0602-reffiles-fsck.sh      | 111 ++++++++++++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 6db0eaa84a..dcea05edfc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..5227dfdef2 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 3d4d612420..f4342e3f3e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3509,6 +3509,43 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (!is_root_ref(referent->buf) &&
+	    check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *target_name,
@@ -3563,6 +3600,9 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 33e7a390ad..ee1e5f2864 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -263,6 +263,109 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for bad_referent in "refs/heads/.branch" "refs/heads/~branch" "refs/heads/?branch"
+	do
+		printf "ref: %s\n" $bad_referent >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badReferentName: points to invalid refname '\''$bad_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: HEAD\n" >$branch_dir_prefix/branch-head &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -313,6 +416,14 @@ test_expect_success 'ref content checks should work with worktrees' '
 	warning: worktrees/worktree-1/refs/worktree/branch-no-newline: refMissingNewline: misses LF at the end
 	EOF
 	rm $worktree1_refdir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse HEAD)" >$worktree1_refdir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-garbage: trailingRefContent: has trailing garbage: '\'' garbage'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-garbage &&
 	test_cmp expect err
 '
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 8/9] ref: check whether the target of the symref is a ref
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (6 preceding siblings ...)
  2024-11-20 11:52                   ` [PATCH v9 7/9] ref: add basic symref content check for files backend shejialuo
@ 2024-11-20 11:52                   ` shejialuo
  2024-11-20 11:52                   ` [PATCH v9 9/9] ref: add symlink ref content check for files backend shejialuo
  2024-11-20 14:26                   ` [PATCH v9 0/9] add " Patrick Steinhardt
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:52 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpseudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the above situation, create
a new fsck message "symrefTargetIsNotARef" to notify the user that this
may become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 +++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          | 14 ++++++++++++--
 t/t0602-reffiles-fsck.sh      | 29 +++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index dcea05edfc..f82ebc58e8 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,15 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symrefTargetIsNotARef`::
+	(INFO) The target of a symbolic reference points neither to
+	a root reference nor to a reference starting with "refs/".
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `trailingRefContent`::
 	(INFO) A loose ref has trailing content. As valid implementations
 	of Git never created such a loose ref file, it may become an
diff --git a/fsck.h b/fsck.h
index 5227dfdef2..53a47612e6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -87,6 +87,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index f4342e3f3e..c2b99fdf40 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3513,6 +3513,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent)
 {
+	int is_referent_root;
 	char orig_last_byte;
 	size_t orig_len;
 	int ret = 0;
@@ -3521,8 +3522,17 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
-	if (!is_root_ref(referent->buf) &&
-	    check_refname_format(referent->buf, 0)) {
+	is_referent_root = is_root_ref(referent->buf);
+	if (!is_referent_root &&
+	    !starts_with(referent->buf, "refs/") &&
+	    !starts_with(referent->buf, "worktrees/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF,
+				      "points to non-ref target '%s'", referent->buf);
+
+	}
+
+	if (!is_referent_root && check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to invalid refname '%s'", referent->buf);
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index ee1e5f2864..692b30727a 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -366,6 +366,35 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'the target of the textual symref should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	for good_referent in "refs/heads/branch" "HEAD" "refs/tags/tag"
+	do
+		printf "ref: %s\n" $good_referent >$branch_dir_prefix/branch-good &&
+		git refs verify 2>err &&
+		rm $branch_dir_prefix/branch-good &&
+		test_must_be_empty err || return 1
+	done &&
+
+	for nonref_referent in "refs-back/heads/branch" "refs-back/tags/tag" "reflogs/refs/heads/branch"
+	do
+		printf "ref: %s\n" $nonref_referent >$branch_dir_prefix/branch-bad-1 &&
+		git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/branch-bad-1: symrefTargetIsNotARef: points to non-ref target '\''$nonref_referent'\''
+		EOF
+		rm $branch_dir_prefix/branch-bad-1 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [PATCH v9 9/9] ref: add symlink ref content check for files backend
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (7 preceding siblings ...)
  2024-11-20 11:52                   ` [PATCH v9 8/9] ref: check whether the target of the symref is a ref shejialuo
@ 2024-11-20 11:52                   ` shejialuo
  2024-11-20 14:26                   ` [PATCH v9 0/9] add " Patrick Steinhardt
  9 siblings, 0 replies; 209+ messages in thread
From: shejialuo @ 2024-11-20 11:52 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano

Besides the textual symref, we also allow symbolic links as the symref.
So, we should also provide the consistency check as what we have done
for textual symref. And also we consider deprecating writing the
symbolic links. We first need to access whether symbolic links still
be used. So, add a new fsck message "symlinkRef(INFO)" to tell the
user be aware of this information.

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

And we need to also generate the "referent" parameter for reusing
"files_fsck_symref_target" by the following steps:

1. Use "strbuf_add_real_path" to resolve the symlink and get the
   absolute path "ref_content" which the symlink ref points to.
2. Generate the absolute path "abs_gitdir" of "gitdir" and combine
   "ref_content" and "abs_gitdir" to extract the relative path
   "relative_referent_path".
3. If "ref_content" is outside of "gitdir", we just set "referent" with
   "ref_content". Instead, we set "referent" with
   "relative_referent_path".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   6 ++
 fsck.h                        |   1 +
 refs/files-backend.c          |  38 ++++++++-
 t/t0602-reffiles-fsck.sh      | 141 ++++++++++++++++++++++++++++++++++
 4 files changed, 182 insertions(+), 4 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index f82ebc58e8..b14bc44ca4 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -183,6 +183,12 @@
 	git@vger.kernel.org mailing list if you see this error, as
 	we need to know what tools created such a file.
 
+`symlinkRef`::
+	(INFO) A symbolic link is used as a symref. Report to the
+	git@vger.kernel.org mailing list if you see this error, as we
+	are assessing the feasibility of dropping the support to drop
+	creating symbolic links as symrefs.
+
 `symrefTargetIsNotARef`::
 	(INFO) The target of a symbolic reference points neither to
 	a root reference nor to a reference starting with "refs/".
diff --git a/fsck.h b/fsck.h
index 53a47612e6..a44c231a5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -86,6 +86,7 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
 	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index c2b99fdf40..ea5961e48c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../config.h"
 #include "../copy.h"
 #include "../environment.h"
@@ -3511,7 +3512,8 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
-				    struct strbuf *referent)
+				    struct strbuf *referent,
+				    unsigned int symbolic_link)
 {
 	int is_referent_root;
 	char orig_last_byte;
@@ -3520,7 +3522,8 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	orig_len = referent->len;
 	orig_last_byte = referent->buf[orig_len - 1];
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
 
 	is_referent_root = is_root_ref(referent->buf);
 	if (!is_referent_root &&
@@ -3539,6 +3542,9 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
+	if (symbolic_link)
+		goto out;
+
 	if (referent->len == orig_len ||
 	    (referent->len < orig_len && orig_last_byte != '\n')) {
 		ret = fsck_report_ref(o, report,
@@ -3562,6 +3568,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct dir_iterator *iter)
 {
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct fsck_ref_report report = { 0 };
 	const char *trailing = NULL;
@@ -3572,8 +3579,30 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	report.path = target_name;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char *relative_referent_path = NULL;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->repo->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&ref_content, iter->path.buf);
+		skip_prefix(ref_content.buf, abs_gitdir.buf,
+			    &relative_referent_path);
+
+		if (relative_referent_path)
+			strbuf_addstr(&referent, relative_referent_path);
+		else
+			strbuf_addbuf(&referent, &ref_content);
+
+		ret |= files_fsck_symref_target(o, &report, &referent, 1);
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		/*
@@ -3611,13 +3640,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 			goto cleanup;
 		}
 	} else {
-		ret = files_fsck_symref_target(o, &report, &referent);
+		ret = files_fsck_symref_target(o, &report, &referent, 0);
 		goto cleanup;
 	}
 
 cleanup:
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 692b30727a..f8f27cfc6c 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -395,6 +395,147 @@ test_expect_success 'the target of the textual symref should be checked' '
 	done
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-escape $branch_dir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   " $branch_dir_prefix/branch-symbolic-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-bad: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-bad: badReferentName: points to invalid refname '\''refs/heads/branch   '\''
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-bad &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to invalid refname '\''refs/tags/.tag'\''
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err
+'
+
+test_expect_success SYMLINKS 'symlink symref content should be checked (worktree)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	main_worktree_refdir_prefix=.git/refs/heads &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	ln -sf ../../../../refs/heads/good-branch $worktree1_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $worktree1_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../worktrees/worktree-1/good-branch $worktree2_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-2/refs/worktree/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $worktree2_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../worktrees/worktree-2/good-branch $main_worktree_refdir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $main_worktree_refdir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../logs/branch-escape $worktree1_refdir_prefix/branch-symbolic &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symlinkRef: use deprecated symbolic link for symref
+	warning: worktrees/worktree-1/refs/worktree/branch-symbolic: symrefTargetIsNotARef: points to non-ref target '\''logs/branch-escape'\''
+	EOF
+	rm $worktree1_refdir_prefix/branch-symbolic &&
+	test_cmp expect err &&
+
+	for bad_referent_name in ".tag" "branch   "
+	do
+		ln -sf ./"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-1/refs/worktree/$bad_referent_name'\''
+		EOF
+		rm $worktree1_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree1_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-1/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-1/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
+		EOF
+		rm $worktree1_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ./"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''worktrees/worktree-2/refs/worktree/$bad_referent_name'\''
+		EOF
+		rm $worktree2_refdir_prefix/bad-symbolic &&
+		test_cmp expect err &&
+
+		ln -sf ../../../../refs/heads/"$bad_referent_name" $worktree2_refdir_prefix/bad-symbolic &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		warning: worktrees/worktree-2/refs/worktree/bad-symbolic: symlinkRef: use deprecated symbolic link for symref
+		error: worktrees/worktree-2/refs/worktree/bad-symbolic: badReferentName: points to invalid refname '\''refs/heads/$bad_referent_name'\''
+		EOF
+		rm $worktree2_refdir_prefix/bad-symbolic &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_expect_success 'ref content checks should work with worktrees' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [PATCH v9 0/9] add ref content check for files backend
  2024-11-20 11:47                 ` [PATCH v9 " shejialuo
                                     ` (8 preceding siblings ...)
  2024-11-20 11:52                   ` [PATCH v9 9/9] ref: add symlink ref content check for files backend shejialuo
@ 2024-11-20 14:26                   ` Patrick Steinhardt
  2024-11-20 23:21                     ` Junio C Hamano
  9 siblings, 1 reply; 209+ messages in thread
From: Patrick Steinhardt @ 2024-11-20 14:26 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Junio C Hamano

On Wed, Nov 20, 2024 at 07:47:04PM +0800, shejialuo wrote:
> Hi All:
> 
> This version fixes two problems:
> 
> 1. Remove unnecessary space.
> 2. Drop extra "strerror(errno)".
> 
> Thanks,
> Jialuo

The range-diff looks as expected, so this version lokos good to me.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [PATCH v9 0/9] add ref content check for files backend
  2024-11-20 14:26                   ` [PATCH v9 0/9] add " Patrick Steinhardt
@ 2024-11-20 23:21                     ` Junio C Hamano
  0 siblings, 0 replies; 209+ messages in thread
From: Junio C Hamano @ 2024-11-20 23:21 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: shejialuo, git, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Nov 20, 2024 at 07:47:04PM +0800, shejialuo wrote:
>> Hi All:
>> 
>> This version fixes two problems:
>> 
>> 1. Remove unnecessary space.
>> 2. Drop extra "strerror(errno)".
>> 
>> Thanks,
>> Jialuo
>
> The range-diff looks as expected, so this version lokos good to me.

Thanks, both.  Looking good.

^ permalink raw reply	[flat|nested] 209+ messages in thread

end of thread, other threads:[~2024-11-20 23:21 UTC | newest]

Thread overview: 209+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 14:18 [RFC] Implement ref content consistency check shejialuo
2024-08-15 10:19 ` karthik nayak
2024-08-15 13:37   ` shejialuo
2024-08-16  9:06     ` Patrick Steinhardt
2024-08-16 16:39       ` Junio C Hamano
2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
2024-08-20 16:25     ` Junio C Hamano
2024-08-21 12:49       ` shejialuo
2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
2024-08-20 16:49     ` Junio C Hamano
2024-08-21 14:21       ` shejialuo
2024-08-22  8:46       ` Patrick Steinhardt
2024-08-22 16:13         ` Junio C Hamano
2024-08-22 16:17           ` Junio C Hamano
2024-08-23  7:21             ` Patrick Steinhardt
2024-08-23 11:30               ` shejialuo
2024-08-22  8:48     ` Patrick Steinhardt
2024-08-22 12:06       ` shejialuo
2024-08-18 15:01   ` [PATCH v1 3/4] ref: add symbolic " shejialuo
2024-08-22  8:53     ` Patrick Steinhardt
2024-08-22 12:42       ` shejialuo
2024-08-23  5:36         ` Patrick Steinhardt
2024-08-23 11:37           ` shejialuo
2024-08-18 15:02   ` [PATCH v1 4/4] ref: add symlink ref consistency " shejialuo
2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
2024-08-27 17:49       ` Junio C Hamano
2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
2024-08-27 16:19       ` shejialuo
2024-08-27 18:21       ` Junio C Hamano
2024-08-28 12:50         ` Patrick Steinhardt
2024-08-28 16:32           ` Junio C Hamano
2024-08-29 10:19             ` Patrick Steinhardt
2024-08-28 14:31         ` shejialuo
2024-08-28 16:45           ` Junio C Hamano
2024-08-28 12:50       ` Patrick Steinhardt
2024-08-28 14:41         ` shejialuo
2024-08-28 15:30         ` Junio C Hamano
2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
2024-08-27 19:19       ` Junio C Hamano
2024-08-28 15:26         ` shejialuo
2024-08-28 12:50       ` Patrick Steinhardt
2024-08-28 15:36         ` shejialuo
2024-08-28 15:41         ` Junio C Hamano
2024-08-29 10:11           ` Patrick Steinhardt
2024-08-27 16:08     ` [PATCH v2 4/4] ref: add symlink ref " shejialuo
2024-08-28 18:42     ` [PATCH] SQUASH??? remove unused parameters Junio C Hamano
2024-08-28 21:28     ` [PATCH v2 0/4] add ref content check for files backend Junio C Hamano
2024-08-29  4:02       ` Jeff King
2024-08-29  4:59         ` Junio C Hamano
2024-08-29  7:00           ` Patrick Steinhardt
2024-08-29 15:07             ` Junio C Hamano
2024-08-29 19:48             ` Jeff King
2024-08-29 15:48           ` shejialuo
2024-08-29 16:12             ` Junio C Hamano
2024-08-29 15:00         ` [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED Junio C Hamano
2024-08-29 17:52           ` Jeff King
2024-08-29 18:06             ` Junio C Hamano
2024-08-29 18:18               ` [PATCH v2] " Junio C Hamano
2024-08-29 18:27                 ` [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__ Junio C Hamano
2024-08-29 19:45                   ` Jeff King
2024-08-29 20:19                     ` Junio C Hamano
2024-08-29 19:40                 ` [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED Jeff King
2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
2024-09-03 12:20       ` [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  7:42           ` shejialuo
2024-09-10 16:07         ` karthik nayak
2024-09-13 10:25           ` shejialuo
2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  8:02           ` shejialuo
2024-09-10 22:19         ` karthik nayak
2024-09-12  4:00           ` shejialuo
2024-09-03 12:21       ` [PATCH v3 4/4] ref: add symlink ref " shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  8:28           ` shejialuo
2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
2024-09-18 16:41           ` Junio C Hamano
2024-09-13 17:17         ` [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-09-18 18:59           ` Junio C Hamano
2024-09-22 14:58             ` shejialuo
2024-09-13 17:17         ` [PATCH v4 3/5] ref: add more strict checks for regular refs shejialuo
2024-09-18 19:39           ` Junio C Hamano
2024-09-22 15:06             ` shejialuo
2024-09-22 16:48               ` Junio C Hamano
2024-09-13 17:18         ` [PATCH v4 4/5] ref: add symref content check for files backend shejialuo
2024-09-18 20:19           ` Junio C Hamano
2024-09-22 15:53             ` shejialuo
2024-09-22 16:55               ` Junio C Hamano
2024-09-13 17:18         ` [PATCH v4 5/5] ref: add symlink ref " shejialuo
2024-09-18 23:02           ` Junio C Hamano
2024-09-18 16:49         ` [PATCH v4 0/5] add " Junio C Hamano
2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-10-08  7:29             ` Karthik Nayak
2024-09-29  7:15           ` [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:42               ` shejialuo
2024-10-07  9:16                 ` Patrick Steinhardt
2024-10-07 12:06                   ` shejialuo
2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:42               ` shejialuo
2024-10-07  9:18                 ` Patrick Steinhardt
2024-10-07 12:08                   ` shejialuo
2024-10-08  7:43             ` Karthik Nayak
2024-10-08 12:24               ` shejialuo
2024-10-08 17:44                 ` Junio C Hamano
2024-10-09  8:05                   ` Patrick Steinhardt
2024-10-09 11:59                     ` shejialuo
2024-10-10  6:52                       ` Patrick Steinhardt
2024-10-10 16:00                         ` Junio C Hamano
2024-10-09 11:55                   ` shejialuo
2024-09-29  7:16           ` [PATCH v5 4/9] ref: add more strict checks for regular refs shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:44               ` shejialuo
2024-10-07  9:25                 ` Patrick Steinhardt
2024-10-07 12:19                   ` shejialuo
2024-09-29  7:16           ` [PATCH v5 5/9] ref: add basic symref content check for files backend shejialuo
2024-10-08  7:58             ` Karthik Nayak
2024-10-08 12:18               ` shejialuo
2024-09-29  7:16           ` [PATCH v5 6/9] ref: add escape check for the referent of symref shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:44               ` shejialuo
2024-10-07  9:26                 ` Patrick Steinhardt
2024-09-29  7:17           ` [PATCH v5 7/9] ref: enhance escape situation for worktrees shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-29  7:17           ` [PATCH v5 8/9] t0602: add ref content checks " shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-29  7:17           ` [PATCH v5 9/9] ref: add symlink ref content check for files backend shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-30 18:57           ` [PATCH v5 0/9] add " Junio C Hamano
2024-10-01  3:40             ` shejialuo
2024-10-07 12:49           ` shejialuo
2024-10-21 13:32           ` [PATCH v6 " shejialuo
2024-10-21 13:34             ` [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
2024-10-21 15:38               ` karthik nayak
2024-10-22 11:42                 ` shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-06 12:37                 ` shejialuo
2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
2024-10-21 15:49               ` karthik nayak
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-06 12:32                 ` shejialuo
2024-11-06 13:14                   ` Patrick Steinhardt
2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
2024-10-21 15:56               ` karthik nayak
2024-10-22 11:44                 ` shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-05 12:52                 ` shejialuo
2024-11-06  6:34                   ` Patrick Steinhardt
2024-11-06 12:20                     ` shejialuo
2024-10-21 13:34             ` [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-10-21 13:34             ` [PATCH v6 6/9] ref: add more strict checks for regular refs shejialuo
2024-10-21 13:35             ` [PATCH v6 7/9] ref: add basic symref content check for files backend shejialuo
2024-10-21 13:35             ` [PATCH v6 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-10-21 13:35             ` [PATCH v6 9/9] ref: add symlink ref content check for files backend shejialuo
2024-10-21 16:09             ` [PATCH v6 0/9] add " Taylor Blau
2024-10-22 11:41               ` shejialuo
2024-10-21 16:18             ` Taylor Blau
2024-11-10 12:07             ` [PATCH v7 " shejialuo
2024-11-10 12:09               ` [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-10 12:09               ` [PATCH v7 2/9] ref: check the full refname instead of basename shejialuo
2024-11-10 12:09               ` [PATCH v7 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-10 12:09               ` [PATCH v7 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-10 12:09               ` [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-13  7:36                 ` Patrick Steinhardt
2024-11-14 12:09                   ` shejialuo
2024-11-10 12:10               ` [PATCH v7 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-10 12:10               ` [PATCH v7 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-10 12:10               ` [PATCH v7 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-10 12:10               ` [PATCH v7 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-13  7:36                 ` Patrick Steinhardt
2024-11-14 12:18                   ` shejialuo
2024-11-13  7:36               ` [PATCH v7 0/9] add " Patrick Steinhardt
2024-11-14 16:51               ` [PATCH v8 " shejialuo
2024-11-14 16:53                 ` [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-14 16:54                 ` [PATCH v8 2/9] ref: check the full refname instead of basename shejialuo
2024-11-14 16:54                 ` [PATCH v8 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-14 16:54                 ` [PATCH v8 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-14 16:54                 ` [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-15  7:11                   ` Patrick Steinhardt
2024-11-15 11:08                     ` shejialuo
2024-11-14 16:54                 ` [PATCH v8 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-14 16:54                 ` [PATCH v8 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-14 16:54                 ` [PATCH v8 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-14 16:55                 ` [PATCH v8 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-15 11:10                 ` [PATCH v8 0/9] add " shejialuo
2024-11-20 11:47                 ` [PATCH v9 " shejialuo
2024-11-20 11:51                   ` [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-20 11:51                   ` [PATCH v9 2/9] ref: check the full refname instead of basename shejialuo
2024-11-20 11:51                   ` [PATCH v9 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-20 11:51                   ` [PATCH v9 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-20 11:51                   ` [PATCH v9 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-20 11:51                   ` [PATCH v9 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-20 11:52                   ` [PATCH v9 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-20 11:52                   ` [PATCH v9 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-20 11:52                   ` [PATCH v9 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-20 14:26                   ` [PATCH v9 0/9] add " Patrick Steinhardt
2024-11-20 23:21                     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).