git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Elijah Newren <newren@gmail.com>,
	 Matthew John Cheetham <mjcheetham@outlook.com>
Subject: Re: [PATCH] fsck: snapshot default refs before object walk
Date: Tue, 30 Dec 2025 09:45:57 +0900	[thread overview]
Message-ID: <xmqq344siypm.fsf@gitster.g> (raw)
In-Reply-To: <pull.2026.git.1767035549378.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Mon, 29 Dec 2025 19:12:29 +0000")

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This problem doesn't occur when refs are specified on the command line
> for us to check, since we use those specified refs for both walking and
> checking.  Using the same refs for walking and checking seems to just
> make sense, so modify the existing code to do the same when refs aren't
> specified.

Excellent analysis and good approach.

> Snapshot the refs at the beginning, and also ignore all
> reflog entries since the time of our snapshot (while this technically
> means we could ignore a reflog entry created before the fsck process
> if the local clock is weird, since reflogs are local-only there are not
> concerns about differences between clocks on different machines).

Repository on a network filesystem being accessed by hosts with
broken clock?

I do not think our reflog API has (1) give me some token to mark
your current state (2) here is the token you gave me earlier, now
iterate and yield entries but ignore entries added after you gave me
that token, so going by the reflog timestamp is probably the best we
could do.  Any approach may get confused when the user tries to be
cute and issues "reflog delete" or "reflog expire" in the middle
anyway, I suspect ;-)

> While worries about live updates while running fsck is likely of most
> interest for forge operators, it will likely also benefit those with
> automated jobs (such as git maintenance) or even casual users who want
> to do other work in their clone while fsck is running.

Great.  Will queue.  Thanks.

> @@ -509,6 +510,9 @@ static int fsck_handle_reflog_ent(const char *refname,
>  				  timestamp_t timestamp, int tz UNUSED,
>  				  const char *message UNUSED, void *cb_data UNUSED)
>  {
> +	if (now && timestamp > now)
> +		return 0;
> +
>  	if (verbose)
>  		fprintf_ln(stderr, _("Checking reflog %s->%s"),
>  			   oid_to_hex(ooid), oid_to_hex(noid));
> @@ -567,14 +571,53 @@ static int fsck_head_link(const char *head_ref_name,
>  			  const char **head_points_at,
>  			  struct object_id *head_oid);
>  
> -static void get_default_heads(void)
> +struct ref_snapshot {
> +	size_t nr;
> +	size_t name_alloc;
> +	size_t oid_alloc;
> +	char **refname;
> +	struct object_id *oid;
> +};

This data structure is somewhat unexpected.  Instead of a struct
that holds two arrays, I would have rather expected an array of
"struct { refname, oid }", with the possiblity to add a "token to
mark the latest reflog entry" to the mix I alluded to earlier when
such an API function materializes.


[Footnote]

We could call refs_for_each_reflog_ent_reverse(), grab the
parameters that each_reflog_ent_fn receives as that "token" for the
latest reflog entry and stop.  That way, we will learn the value of
<old,new,committer,timestamp,tz,msg>, which should be a robust
enough unique key.

After that when iterating over the reflog, we know we should stop
after processing the reflog entry that holds the recorded value.

      reply	other threads:[~2025-12-30  0:46 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29 19:12 [PATCH] fsck: snapshot default refs before object walk Elijah Newren via GitGitGadget
2025-12-30  0:45 ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq344siypm.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mjcheetham@outlook.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).