git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: shejialuo <shejialuo@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC] Implement consistency check for packed refs
Date: Tue, 17 Dec 2024 19:44:56 +0800	[thread overview]
Message-ID: <Z2FkOO9bl-IS_era@ArchLinux> (raw)
In-Reply-To: <Z2A93X2rxZEnYE76@pks.im>

On Mon, Dec 16, 2024 at 03:49:01PM +0100, Patrick Steinhardt wrote:
> On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote:
> > Hi all,
> > 
> > At current, I have already implemented the consistency check for files
> > backend. My next step is to implement the consistency check for packed
> > refs. And during these days, I have learned the source code of the
> > "refs/packed-backend.c".
> 
> Great. I'm also starting to work into the direction of consistency
> checks for reftables right now, which requires a couple more changes for
> the reftable library to expose reftable blocks. I'll probably get to it
> early in the next release cycle.
> 

I am also happy to see this. We are gradually making things work. It's
good. And maybe we could make a GSoC project "implement consistency check
for reftable backend" if we decide to attend GSoC next year.

> > The current "git-fsck(1)" implicitly checks the packed refs by calling
> > the method `refs/packed-backend.c::packed_refs_lock` which alls
> > `get_snapshot` and `create_snapshot`.
> > 
> > In the `create_snapshot` function, it will check some consistency of the
> > "packed-refs" file, if anything is wrong, the program will die.
> > "git-fsck(1)" relies on this to check the basic format correctness for
> > "packed-refs" file. It's not suitable to just call "packed_refs_lock"
> > in the code, we should not die the program.
> > 
> > Based on above, I have some ideas to implement the functionality. But
> > before I implement the actual code, I want to share my ideas to the
> > mailing list to get some feedback.
> > 
> > There are two ways we could add consistency check for packed refs.
> > 
> > 1. We simply read the raw file "packed-refs" file, check each line. Of
> > course, we should also need to check whether the content is sorted.
> > 2. We may reuse the data structure "struct snapshot" to do this. And we
> > call "packed_refs_lock" without calling the "creating_snapshot" instead,
> > we should just check. The reason why we do this is that we could reuse
> > some functions defined in the "packed-backend.c" instead of repeating
> > some logics. However, this way would be more complicated and require
> > more effort.
> 
> Hm. I cannot really say much on this. The important part is that you
> have enough information at hand to be able to implement those checks. If
> you have all necessary information in both cases I would recommend to go
> with the one that is simpler.
> 
> > However, one thing I am not sure is that should we lock the raw file
> > when checking? From my perspective, we should lock the file because we
> > are checking the consistency of it. If other processes are processing
> > the "packed-refs", we may encounter inconsistency and things would be
> > strange.
> 
> The consistency checks may run for an extended amount of time in repos
> with huge number of refs, and locking for the whole duration may easily
> cause problems.
> 
> Furthermore, "packed-refs" files are written atomically: the client
> writes a new "packed-refs.lock" file, syncs it to disk and then renames
> it into place. This operation doesn't invalidate any file descriptors
> that refer to the old file and you can continue reading from it, so the
> snapshot would remain consistent in itself. It could of course become
> inconsistent with any loose refs, but that's always the case with the
> "files" backend and something we'll have to accept.
> 
> So I don't see any reason why the consistency checks should lock the
> "packed-refs" file.
> 

Thanks, I either don't want to lock the file because this would bring
complexity but without much benefit.

> Patrick

      reply	other threads:[~2024-12-17 11:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 13:58 [RFC] Implement consistency check for packed refs shejialuo
2024-12-16 14:49 ` Patrick Steinhardt
2024-12-17 11:44   ` shejialuo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2FkOO9bl-IS_era@ArchLinux \
    --to=shejialuo@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).