git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Implement consistency check for packed refs
@ 2024-12-16 13:58 shejialuo
  2024-12-16 14:49 ` Patrick Steinhardt
  0 siblings, 1 reply; 3+ messages in thread
From: shejialuo @ 2024-12-16 13:58 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Michael Haggerty,
	Junio C Hamano

Hi all,

At current, I have already implemented the consistency check for files
backend. My next step is to implement the consistency check for packed
refs. And during these days, I have learned the source code of the
"refs/packed-backend.c".

The current "git-fsck(1)" implicitly checks the packed refs by calling
the method `refs/packed-backend.c::packed_refs_lock` which alls
`get_snapshot` and `create_snapshot`.

In the `create_snapshot` function, it will check some consistency of the
"packed-refs" file, if anything is wrong, the program will die.
"git-fsck(1)" relies on this to check the basic format correctness for
"packed-refs" file. It's not suitable to just call "packed_refs_lock"
in the code, we should not die the program.

Based on above, I have some ideas to implement the functionality. But
before I implement the actual code, I want to share my ideas to the
mailing list to get some feedback.

There are two ways we could add consistency check for packed refs.

1. We simply read the raw file "packed-refs" file, check each line. Of
course, we should also need to check whether the content is sorted.
2. We may reuse the data structure "struct snapshot" to do this. And we
call "packed_refs_lock" without calling the "creating_snapshot" instead,
we should just check. The reason why we do this is that we could reuse
some functions defined in the "packed-backend.c" instead of repeating
some logics. However, this way would be more complicated and require
more effort.

However, one thing I am not sure is that should we lock the raw file
when checking? From my perspective, we should lock the file because we
are checking the consistency of it. If other processes are processing
the "packed-refs", we may encounter inconsistency and things would be
strange.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Implement consistency check for packed refs
  2024-12-16 13:58 [RFC] Implement consistency check for packed refs shejialuo
@ 2024-12-16 14:49 ` Patrick Steinhardt
  2024-12-17 11:44   ` shejialuo
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Steinhardt @ 2024-12-16 14:49 UTC (permalink / raw)
  To: shejialuo; +Cc: git, Karthik Nayak, Michael Haggerty, Junio C Hamano

On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote:
> Hi all,
> 
> At current, I have already implemented the consistency check for files
> backend. My next step is to implement the consistency check for packed
> refs. And during these days, I have learned the source code of the
> "refs/packed-backend.c".

Great. I'm also starting to work into the direction of consistency
checks for reftables right now, which requires a couple more changes for
the reftable library to expose reftable blocks. I'll probably get to it
early in the next release cycle.

> The current "git-fsck(1)" implicitly checks the packed refs by calling
> the method `refs/packed-backend.c::packed_refs_lock` which alls
> `get_snapshot` and `create_snapshot`.
> 
> In the `create_snapshot` function, it will check some consistency of the
> "packed-refs" file, if anything is wrong, the program will die.
> "git-fsck(1)" relies on this to check the basic format correctness for
> "packed-refs" file. It's not suitable to just call "packed_refs_lock"
> in the code, we should not die the program.
> 
> Based on above, I have some ideas to implement the functionality. But
> before I implement the actual code, I want to share my ideas to the
> mailing list to get some feedback.
> 
> There are two ways we could add consistency check for packed refs.
> 
> 1. We simply read the raw file "packed-refs" file, check each line. Of
> course, we should also need to check whether the content is sorted.
> 2. We may reuse the data structure "struct snapshot" to do this. And we
> call "packed_refs_lock" without calling the "creating_snapshot" instead,
> we should just check. The reason why we do this is that we could reuse
> some functions defined in the "packed-backend.c" instead of repeating
> some logics. However, this way would be more complicated and require
> more effort.

Hm. I cannot really say much on this. The important part is that you
have enough information at hand to be able to implement those checks. If
you have all necessary information in both cases I would recommend to go
with the one that is simpler.

> However, one thing I am not sure is that should we lock the raw file
> when checking? From my perspective, we should lock the file because we
> are checking the consistency of it. If other processes are processing
> the "packed-refs", we may encounter inconsistency and things would be
> strange.

The consistency checks may run for an extended amount of time in repos
with huge number of refs, and locking for the whole duration may easily
cause problems.

Furthermore, "packed-refs" files are written atomically: the client
writes a new "packed-refs.lock" file, syncs it to disk and then renames
it into place. This operation doesn't invalidate any file descriptors
that refer to the old file and you can continue reading from it, so the
snapshot would remain consistent in itself. It could of course become
inconsistent with any loose refs, but that's always the case with the
"files" backend and something we'll have to accept.

So I don't see any reason why the consistency checks should lock the
"packed-refs" file.

Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Implement consistency check for packed refs
  2024-12-16 14:49 ` Patrick Steinhardt
@ 2024-12-17 11:44   ` shejialuo
  0 siblings, 0 replies; 3+ messages in thread
From: shejialuo @ 2024-12-17 11:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Michael Haggerty, Junio C Hamano

On Mon, Dec 16, 2024 at 03:49:01PM +0100, Patrick Steinhardt wrote:
> On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote:
> > Hi all,
> > 
> > At current, I have already implemented the consistency check for files
> > backend. My next step is to implement the consistency check for packed
> > refs. And during these days, I have learned the source code of the
> > "refs/packed-backend.c".
> 
> Great. I'm also starting to work into the direction of consistency
> checks for reftables right now, which requires a couple more changes for
> the reftable library to expose reftable blocks. I'll probably get to it
> early in the next release cycle.
> 

I am also happy to see this. We are gradually making things work. It's
good. And maybe we could make a GSoC project "implement consistency check
for reftable backend" if we decide to attend GSoC next year.

> > The current "git-fsck(1)" implicitly checks the packed refs by calling
> > the method `refs/packed-backend.c::packed_refs_lock` which alls
> > `get_snapshot` and `create_snapshot`.
> > 
> > In the `create_snapshot` function, it will check some consistency of the
> > "packed-refs" file, if anything is wrong, the program will die.
> > "git-fsck(1)" relies on this to check the basic format correctness for
> > "packed-refs" file. It's not suitable to just call "packed_refs_lock"
> > in the code, we should not die the program.
> > 
> > Based on above, I have some ideas to implement the functionality. But
> > before I implement the actual code, I want to share my ideas to the
> > mailing list to get some feedback.
> > 
> > There are two ways we could add consistency check for packed refs.
> > 
> > 1. We simply read the raw file "packed-refs" file, check each line. Of
> > course, we should also need to check whether the content is sorted.
> > 2. We may reuse the data structure "struct snapshot" to do this. And we
> > call "packed_refs_lock" without calling the "creating_snapshot" instead,
> > we should just check. The reason why we do this is that we could reuse
> > some functions defined in the "packed-backend.c" instead of repeating
> > some logics. However, this way would be more complicated and require
> > more effort.
> 
> Hm. I cannot really say much on this. The important part is that you
> have enough information at hand to be able to implement those checks. If
> you have all necessary information in both cases I would recommend to go
> with the one that is simpler.
> 
> > However, one thing I am not sure is that should we lock the raw file
> > when checking? From my perspective, we should lock the file because we
> > are checking the consistency of it. If other processes are processing
> > the "packed-refs", we may encounter inconsistency and things would be
> > strange.
> 
> The consistency checks may run for an extended amount of time in repos
> with huge number of refs, and locking for the whole duration may easily
> cause problems.
> 
> Furthermore, "packed-refs" files are written atomically: the client
> writes a new "packed-refs.lock" file, syncs it to disk and then renames
> it into place. This operation doesn't invalidate any file descriptors
> that refer to the old file and you can continue reading from it, so the
> snapshot would remain consistent in itself. It could of course become
> inconsistent with any loose refs, but that's always the case with the
> "files" backend and something we'll have to accept.
> 
> So I don't see any reason why the consistency checks should lock the
> "packed-refs" file.
> 

Thanks, I either don't want to lock the file because this would bring
complexity but without much benefit.

> Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-12-17 11:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-16 13:58 [RFC] Implement consistency check for packed refs shejialuo
2024-12-16 14:49 ` Patrick Steinhardt
2024-12-17 11:44   ` shejialuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).