git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Discuss GSoC: Implement consistency checks for refs
@ 2024-03-10 10:01 shejialuo
  2024-03-14  3:38 ` Kaartic Sivaraam
  0 siblings, 1 reply; 4+ messages in thread
From: shejialuo @ 2024-03-10 10:01 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Thanks for you help. I'm sorry for the delay in resonding to your email
due to my internship.

> I know this is splitting hairs, but git-fsck(1) doesn't give us the
> tools to avoid corruption. It only gives us the tools to detect it after
> the fact.

I DO misundestood the `git-fsck(1)`.

This time, I have read more source codes about `git-fsck` and refs
internal. So I wanna discuss some implementation of the infrastructure
this time.

I am inspired by `refs-internal.h`, this file declares `ref_storage_be`,
and for every backend, it should implement the interfaces like
`ref_store_init_fn`, `ref_init_db_fn` and etc. And in `refs.h`, it
provides the interfaces to other modules.

Based above idea, I think we could just create files in `refs` directory
and we could implement a file called `ref-check.h`, we design the
interfaces for different backends.

After that, we could compose this structure into `ref_storage_be` and we
could call these interfaces in `fsck.c`. If there are some different
interfaces, we could downcast to a specified type to call the specified
functions. (Actually, I have learned a lot how OOP is implemented in C).

> For what it's worth, not all of the checks need to be implemented as
> part of GSoC. At a minimum, it should result in the infra to allow for
> backend-specific checks and a couple of checks for at least one of the
> backends.

I think using the above idea, we could provide an infrastructure to allow
more checks later.

> You will certainly need to learn about ref internals a bit. There are
> some common rules and restrictions that are important in order to figure
> out what we want to check in the first place. Understanding the
> "reftable" format would be great, but you may also get away with only
> implementing generic or "files"-backend specific consistency checks.
> This depends on the scope you are aiming for.

I think I will at least implement the generic part and files-backend
consistency check. I will then read some specs about the reftable and the
source code of it. If there is sufficient time available, I think I
could implement all of them. However, I am currently interning remotely,
the response may slow.

^ permalink raw reply	[flat|nested] 4+ messages in thread
* Discuss GSoC: Implement consistency checks for refs
@ 2024-03-06 13:20 shejialuo
  2024-03-06 14:45 ` Patrick Steinhardt
  0 siblings, 1 reply; 4+ messages in thread
From: shejialuo @ 2024-03-06 13:20 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Hi All,

I am interested in "Implement consistency checks for refs" GSoC idea.
However, implementing a feautre is much harder. So I wanna ask you some
questions to better work on.

As [1] shows, I think the idea is easy to understand. We need to ensure
the consistency of the refs. The current `git-fsck` only checks the
connectivity from ref to the object file. There is a possiblity that ref
itself could be corrupted. And we should avoid it through this project.

I have read some source codes. Based on what I have learned, I know
there are two backends. One is file and another is reftable. I have
no idea about the reftable currently. So at now, I will focus on the
file backend.

I think the principle behind the `git-fsck` is that it will traverse
every object file, read its content and use SHA-1 to hash the content
and compare the value with the stored ref value. So if we want to add
consistency checks for refs. We may need to add a new file to store the
last commit state (not only last commit state, do we need to consider
the stash state). However, from my perspective, it's a bad idea to use a
file to store the refs' states and we cannot use object file to check
whether the ref is corrupted.

So this is my first question, what mechanism should we use to provide
consistency? And to what extend for the consistency. And I think this
mechanism should be general for both text-based and binary-based refs.

And I have a more general qeustion, I think I need understand `fsck.c`
and of couse the reftable format. However, I am confused whether I need
to understand the ref internal. And could you please provide me more
infomration to make this idea more clear.

Thanks,
Jialuo

[1] https://lore.kernel.org/git/ZakIPEytlxHGCB9Y@tanuki/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-14  3:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-10 10:01 Discuss GSoC: Implement consistency checks for refs shejialuo
2024-03-14  3:38 ` Kaartic Sivaraam
  -- strict thread matches above, loose matches on Subject: below --
2024-03-06 13:20 shejialuo
2024-03-06 14:45 ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).