git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: shejialuo <shejialuo@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>,
	Karthik Nayak <karthik.188@gmail.com>
Subject: Re: [PATCH v1 2/4] ref: add regular ref content check for files backend
Date: Wed, 21 Aug 2024 22:21:47 +0800	[thread overview]
Message-ID: <ZsX3-yU52X2fe6JT@ArchLinux> (raw)
In-Reply-To: <xmqqed6j9m24.fsf@gitster.g>

On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.

I will fix in the next version.

> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
>

I am very sorry about this. Actually, I should not use "should". I don't
give compelling reasons here why we need to introduce such checks. I
just told the reviewer "we should warn". I will try to avoid above
mistakes where I didn't give enough motivation.

> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 

Let me first talk about what will happen when we use the following
command:

  $ git checkout bad-branch

I use "gdb" to find the following call sequence:

  "cmd_checkout" -> "checkout_main" -> "parse_branchname_arg" ->
  ... -> "get_oid_basic" -> "repo_dwim_ref" -> ... ->
  "parse_loose_ref_contents" -> "parse_oid_hex_algop" ->
  "get_oid_hex_algop"

I dive into the "object-name.c::get_oid_basic" function. If we pass the
actually "oid", it will call the "get_oid_hex_algop" directly.
Otherwise, it will execute the following code:

  if (!len && reflog_len)
      refs_found = ...;
  else if (reflog_len)
      refs_found = ...
  else
      refs_found = repo_dwim_ref(r, str, len, oid, &real_ref, !fatal);

  if (!refs_found)
      return -1;

As we can see, when there is no corresponding refs found by calling
"repo_dwim_ref" function, "get_oid_basic" function will return -1. And
here we could have one important conclusion:

  The "get_oid_basic" function relies on "repo_dwim_ref" function to
  parse the ref and get the pointee "oid". So, it uses the interfaces
  provided by ref backend.

Next, we look at what will "parse_loose_ref_contents" do for regular
refs.

  int parse_loose_ref_contents(...)
  {
      ...
      if (parse_oid_hex_algop(buf, oid, *p, algop) ||
         (*p != '\0' && !isspace(*p))) {
            *type |= REF_ISBROKEN;
            *failure_errno = EINVAL;
            return -1;
      }
      return 0;
  }

Let's continue to see what "parse_oid_hex_algop" will do:

  int parse_oid_hex_algop(...)
  {
      int ret = get_oid_hex_algop(hex, oid, algop);
      if (!ret) {
          *end = hex + algop->hexsz;
      }
      return ret;
  }

If the result of "get_oid_hex_algop" is successful. We will set the
"end" pointer here. The "get_oid_hex_algop" will eventually call the
"get_hash_hex_algop" function

  static int get_hash_hex_algop(...)
  {
      int i;
      for (i = 0; i < algop->rawsz; i++) {
          int val = hex2chr(hex);
          if (val < 0)
              return -1;
          *hash+= = val;
          hex += 2;
      }
      return 0;
  }

This function will convert the hex to char by the raw size of the
algorithm. And by the following code, we could conclude the following
things:

1. "41053a9084501db79c72b14e8a5a0b67de3f91ae" is correct, because it
will be parsed successfully by "get_hash_hex_algop" and "*p == '\0'".
2. "41053a9084501db79c72b14e8a5a0b67de3f91aef" is not correct, it will
be parsed successfully by "get_hash_hex_algop" but "*p != '\0'"
and "isspace(*p)" is false. So the check in "parse_loose_ref_contents"
cannot be passed.
3. "1053a9084501db79c72b14e8a5a0b67de3f91a" is not correct, it cannot be
parsed successfully by "get_hash_hex_algop".
4. "41053a9084501db79c72b14e8a5a0b67de3f91ae garbage" is correct,
because it will be parsed successfully by "get_hash_hex_algop" and
"isspace(*p)" is true.

By the above discussion, I could answer you comments one by one.

> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.

When the content is short, although it may be sufficient to identify the
object, we should still report an error here. This is because we care
about the ref. As we can see from above discussion, the "object-name.c"
totally relies on the interfaces provided by the ref backend. And
"get_hash_hex_algop" is unhappy about this situation. And eventually the
"object-name.c::get_oid_basic" will be unhappy, return -1.

> A file that has 64-hexdigit with a terminating LF at the end may be
> a valid file to be in $GIT_DIR/refs/ hierarchy in a SHA-256
> repository, but such a file in a SHA-1 repository should also be
> subject to a warning, as it could be a sign that somebody screwed up
> object format conversion.

I agree with this idea. But in this implementation, we want to reuse the
"parse_loose_ref_contents" to check the consistency of the regular refs.
If we are in a SHA-1 repository, "parse_loose_ref_contents" will be
unhappy about this. However, I don't think we need to provide user that
"the content is 64-hexdigit ...". We just report "bad ref content" to
the user. This will also indicate the user something is wrong, you need
to check the ref database.

> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?

This is the core problem why we want to introduce more strict check.
Because in the current "parse_loose_ref_contents" function, as long as
the next byte of the end of the hex is '\0', spaces, LF, CRLF. We could
know that the content of the ref is OK.

But in my view, we should warn the user about this situation. This is
because in the original code, we do not check the ref strictly for files
backend. And I think at current, the normal user should not interact
with the git database. If there are some garbages we found in the ref
database, I guess this could be a sign for the user: "Watch out! there
may be something wrong".

> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

That's a hard question, really. I find it hard to know what should we
do? The motivation is hard to describe. But I think this reply could
make thing more clear here.

Thanks,
Jialuo

  reply	other threads:[~2024-08-21 14:21 UTC|newest]

Thread overview: 209+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 14:18 [RFC] Implement ref content consistency check shejialuo
2024-08-15 10:19 ` karthik nayak
2024-08-15 13:37   ` shejialuo
2024-08-16  9:06     ` Patrick Steinhardt
2024-08-16 16:39       ` Junio C Hamano
2024-08-18 15:00 ` [PATCH v1 0/4] add ref content check for files backend shejialuo
2024-08-18 15:01   ` [PATCH v1 1/4] fsck: introduce "FSCK_REF_REPORT_DEFAULT" macro shejialuo
2024-08-20 16:25     ` Junio C Hamano
2024-08-21 12:49       ` shejialuo
2024-08-18 15:01   ` [PATCH v1 2/4] ref: add regular ref content check for files backend shejialuo
2024-08-20 16:49     ` Junio C Hamano
2024-08-21 14:21       ` shejialuo [this message]
2024-08-22  8:46       ` Patrick Steinhardt
2024-08-22 16:13         ` Junio C Hamano
2024-08-22 16:17           ` Junio C Hamano
2024-08-23  7:21             ` Patrick Steinhardt
2024-08-23 11:30               ` shejialuo
2024-08-22  8:48     ` Patrick Steinhardt
2024-08-22 12:06       ` shejialuo
2024-08-18 15:01   ` [PATCH v1 3/4] ref: add symbolic " shejialuo
2024-08-22  8:53     ` Patrick Steinhardt
2024-08-22 12:42       ` shejialuo
2024-08-23  5:36         ` Patrick Steinhardt
2024-08-23 11:37           ` shejialuo
2024-08-18 15:02   ` [PATCH v1 4/4] ref: add symlink ref consistency " shejialuo
2024-08-27 16:04   ` [PATCH v2 0/4] add ref content " shejialuo
2024-08-27 16:07     ` [PATCH v2 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
2024-08-27 17:49       ` Junio C Hamano
2024-08-27 16:07     ` [PATCH v2 2/4] ref: add regular ref content check for files backend shejialuo
2024-08-27 16:19       ` shejialuo
2024-08-27 18:21       ` Junio C Hamano
2024-08-28 12:50         ` Patrick Steinhardt
2024-08-28 16:32           ` Junio C Hamano
2024-08-29 10:19             ` Patrick Steinhardt
2024-08-28 14:31         ` shejialuo
2024-08-28 16:45           ` Junio C Hamano
2024-08-28 12:50       ` Patrick Steinhardt
2024-08-28 14:41         ` shejialuo
2024-08-28 15:30         ` Junio C Hamano
2024-08-27 16:08     ` [PATCH v2 3/4] ref: add symbolic " shejialuo
2024-08-27 19:19       ` Junio C Hamano
2024-08-28 15:26         ` shejialuo
2024-08-28 12:50       ` Patrick Steinhardt
2024-08-28 15:36         ` shejialuo
2024-08-28 15:41         ` Junio C Hamano
2024-08-29 10:11           ` Patrick Steinhardt
2024-08-27 16:08     ` [PATCH v2 4/4] ref: add symlink ref " shejialuo
2024-08-28 18:42     ` [PATCH] SQUASH??? remove unused parameters Junio C Hamano
2024-08-28 21:28     ` [PATCH v2 0/4] add ref content check for files backend Junio C Hamano
2024-08-29  4:02       ` Jeff King
2024-08-29  4:59         ` Junio C Hamano
2024-08-29  7:00           ` Patrick Steinhardt
2024-08-29 15:07             ` Junio C Hamano
2024-08-29 19:48             ` Jeff King
2024-08-29 15:48           ` shejialuo
2024-08-29 16:12             ` Junio C Hamano
2024-08-29 15:00         ` [PATCH 8/6] CodingGuidelines: also mention MAYBE_UNUSED Junio C Hamano
2024-08-29 17:52           ` Jeff King
2024-08-29 18:06             ` Junio C Hamano
2024-08-29 18:18               ` [PATCH v2] " Junio C Hamano
2024-08-29 18:27                 ` [PATCH 9/6] git-compat-util: guard definition of MAYBE_UNUSED with __GNUC__ Junio C Hamano
2024-08-29 19:45                   ` Jeff King
2024-08-29 20:19                     ` Junio C Hamano
2024-08-29 19:40                 ` [PATCH v2] CodingGuidelines: also mention MAYBE_UNUSED Jeff King
2024-09-03 12:18     ` [PATCH v3 0/4] add ref content check for files backend shejialuo
2024-09-03 12:20       ` [PATCH v3 1/4] ref: initialize "fsck_ref_report" with zero shejialuo
2024-09-03 12:20       ` [PATCH v3 2/4] ref: add regular ref content check for files backend shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  7:42           ` shejialuo
2024-09-10 16:07         ` karthik nayak
2024-09-13 10:25           ` shejialuo
2024-09-03 12:20       ` [PATCH v3 3/4] ref: add symref " shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  8:02           ` shejialuo
2024-09-10 22:19         ` karthik nayak
2024-09-12  4:00           ` shejialuo
2024-09-03 12:21       ` [PATCH v3 4/4] ref: add symlink ref " shejialuo
2024-09-09 15:04         ` Patrick Steinhardt
2024-09-10  8:28           ` shejialuo
2024-09-13 17:14       ` [PATCH v4 0/5] add " shejialuo
2024-09-13 17:17         ` [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero shejialuo
2024-09-18 16:41           ` Junio C Hamano
2024-09-13 17:17         ` [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-09-18 18:59           ` Junio C Hamano
2024-09-22 14:58             ` shejialuo
2024-09-13 17:17         ` [PATCH v4 3/5] ref: add more strict checks for regular refs shejialuo
2024-09-18 19:39           ` Junio C Hamano
2024-09-22 15:06             ` shejialuo
2024-09-22 16:48               ` Junio C Hamano
2024-09-13 17:18         ` [PATCH v4 4/5] ref: add symref content check for files backend shejialuo
2024-09-18 20:19           ` Junio C Hamano
2024-09-22 15:53             ` shejialuo
2024-09-22 16:55               ` Junio C Hamano
2024-09-13 17:18         ` [PATCH v4 5/5] ref: add symlink ref " shejialuo
2024-09-18 23:02           ` Junio C Hamano
2024-09-18 16:49         ` [PATCH v4 0/5] add " Junio C Hamano
2024-09-29  7:13         ` [PATCH v5 0/9] " shejialuo
2024-09-29  7:15           ` [PATCH v5 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-10-08  7:29             ` Karthik Nayak
2024-09-29  7:15           ` [PATCH v5 2/9] builtin/refs: support multiple worktrees check for refs shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:42               ` shejialuo
2024-10-07  9:16                 ` Patrick Steinhardt
2024-10-07 12:06                   ` shejialuo
2024-09-29  7:15           ` [PATCH v5 3/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:42               ` shejialuo
2024-10-07  9:18                 ` Patrick Steinhardt
2024-10-07 12:08                   ` shejialuo
2024-10-08  7:43             ` Karthik Nayak
2024-10-08 12:24               ` shejialuo
2024-10-08 17:44                 ` Junio C Hamano
2024-10-09  8:05                   ` Patrick Steinhardt
2024-10-09 11:59                     ` shejialuo
2024-10-10  6:52                       ` Patrick Steinhardt
2024-10-10 16:00                         ` Junio C Hamano
2024-10-09 11:55                   ` shejialuo
2024-09-29  7:16           ` [PATCH v5 4/9] ref: add more strict checks for regular refs shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:44               ` shejialuo
2024-10-07  9:25                 ` Patrick Steinhardt
2024-10-07 12:19                   ` shejialuo
2024-09-29  7:16           ` [PATCH v5 5/9] ref: add basic symref content check for files backend shejialuo
2024-10-08  7:58             ` Karthik Nayak
2024-10-08 12:18               ` shejialuo
2024-09-29  7:16           ` [PATCH v5 6/9] ref: add escape check for the referent of symref shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:44               ` shejialuo
2024-10-07  9:26                 ` Patrick Steinhardt
2024-09-29  7:17           ` [PATCH v5 7/9] ref: enhance escape situation for worktrees shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-29  7:17           ` [PATCH v5 8/9] t0602: add ref content checks " shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-29  7:17           ` [PATCH v5 9/9] ref: add symlink ref content check for files backend shejialuo
2024-10-07  6:58             ` Patrick Steinhardt
2024-10-07  8:45               ` shejialuo
2024-09-30 18:57           ` [PATCH v5 0/9] add " Junio C Hamano
2024-10-01  3:40             ` shejialuo
2024-10-07 12:49           ` shejialuo
2024-10-21 13:32           ` [PATCH v6 " shejialuo
2024-10-21 13:34             ` [PATCH v6 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-10-21 13:34             ` [PATCH v6 2/9] ref: check the full refname instead of basename shejialuo
2024-10-21 15:38               ` karthik nayak
2024-10-22 11:42                 ` shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-06 12:37                 ` shejialuo
2024-10-21 13:34             ` [PATCH v6 3/9] ref: initialize target name outside of check functions shejialuo
2024-10-21 15:49               ` karthik nayak
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-06 12:32                 ` shejialuo
2024-11-06 13:14                   ` Patrick Steinhardt
2024-10-21 13:34             ` [PATCH v6 4/9] ref: support multiple worktrees check for refs shejialuo
2024-10-21 15:56               ` karthik nayak
2024-10-22 11:44                 ` shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-11-05 12:52                 ` shejialuo
2024-11-06  6:34                   ` Patrick Steinhardt
2024-11-06 12:20                     ` shejialuo
2024-10-21 13:34             ` [PATCH v6 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-05  7:11               ` Patrick Steinhardt
2024-10-21 13:34             ` [PATCH v6 6/9] ref: add more strict checks for regular refs shejialuo
2024-10-21 13:35             ` [PATCH v6 7/9] ref: add basic symref content check for files backend shejialuo
2024-10-21 13:35             ` [PATCH v6 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-10-21 13:35             ` [PATCH v6 9/9] ref: add symlink ref content check for files backend shejialuo
2024-10-21 16:09             ` [PATCH v6 0/9] add " Taylor Blau
2024-10-22 11:41               ` shejialuo
2024-10-21 16:18             ` Taylor Blau
2024-11-10 12:07             ` [PATCH v7 " shejialuo
2024-11-10 12:09               ` [PATCH v7 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-10 12:09               ` [PATCH v7 2/9] ref: check the full refname instead of basename shejialuo
2024-11-10 12:09               ` [PATCH v7 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-10 12:09               ` [PATCH v7 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-10 12:09               ` [PATCH v7 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-13  7:36                 ` Patrick Steinhardt
2024-11-14 12:09                   ` shejialuo
2024-11-10 12:10               ` [PATCH v7 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-10 12:10               ` [PATCH v7 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-10 12:10               ` [PATCH v7 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-10 12:10               ` [PATCH v7 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-13  7:36                 ` Patrick Steinhardt
2024-11-14 12:18                   ` shejialuo
2024-11-13  7:36               ` [PATCH v7 0/9] add " Patrick Steinhardt
2024-11-14 16:51               ` [PATCH v8 " shejialuo
2024-11-14 16:53                 ` [PATCH v8 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-14 16:54                 ` [PATCH v8 2/9] ref: check the full refname instead of basename shejialuo
2024-11-14 16:54                 ` [PATCH v8 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-14 16:54                 ` [PATCH v8 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-14 16:54                 ` [PATCH v8 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-15  7:11                   ` Patrick Steinhardt
2024-11-15 11:08                     ` shejialuo
2024-11-14 16:54                 ` [PATCH v8 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-14 16:54                 ` [PATCH v8 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-14 16:54                 ` [PATCH v8 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-14 16:55                 ` [PATCH v8 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-15 11:10                 ` [PATCH v8 0/9] add " shejialuo
2024-11-20 11:47                 ` [PATCH v9 " shejialuo
2024-11-20 11:51                   ` [PATCH v9 1/9] ref: initialize "fsck_ref_report" with zero shejialuo
2024-11-20 11:51                   ` [PATCH v9 2/9] ref: check the full refname instead of basename shejialuo
2024-11-20 11:51                   ` [PATCH v9 3/9] ref: initialize ref name outside of check functions shejialuo
2024-11-20 11:51                   ` [PATCH v9 4/9] ref: support multiple worktrees check for refs shejialuo
2024-11-20 11:51                   ` [PATCH v9 5/9] ref: port git-fsck(1) regular refs check for files backend shejialuo
2024-11-20 11:51                   ` [PATCH v9 6/9] ref: add more strict checks for regular refs shejialuo
2024-11-20 11:52                   ` [PATCH v9 7/9] ref: add basic symref content check for files backend shejialuo
2024-11-20 11:52                   ` [PATCH v9 8/9] ref: check whether the target of the symref is a ref shejialuo
2024-11-20 11:52                   ` [PATCH v9 9/9] ref: add symlink ref content check for files backend shejialuo
2024-11-20 14:26                   ` [PATCH v9 0/9] add " Patrick Steinhardt
2024-11-20 23:21                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsX3-yU52X2fe6JT@ArchLinux \
    --to=shejialuo@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).