git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Teach git-fetch(1) to use a quarantine directory
@ 2023-07-17 10:48 Toon Claes
  2023-07-17 14:06 ` Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Toon Claes @ 2023-07-17 10:48 UTC (permalink / raw)
  To: git

Hi,

I've been looking into making git-fetch(1) to use a quarantine
directory, but I'm a bit stuck on direction.

I took git-receive-pack(1) as an example how it uses a quarantine
directory. It seems it sets the environment variables
$GIT_OBJECT_DIRECTORY and $GIT_ALTERNATE_OBJECT_DIRECTORIES so the real
object db is used as an alternative, and a temporary is set as the
default. Then a sub-process is spawned to uses these. In case of
git-receive-pack(1), it calls git-unpack-objects(1).

At the moment git-fetch(1) does not spawn any similar subprocess, so if
we want to take the same approach to use the quarantine, we'll need to
split up that command.

But then we run into another problem as well. git-fetch(1) updates
references, and that is something that's not allowed when using a tmp
object directory.

As far as I can tell from the code, fetching packs and updating refs is
heavily intertwined, so I'm not sure this approach is the best way
forward. So a few questions:

1) Does it even make sense to make use git-fetch(1) use a quarantine
   directory?
2) When making git-fetch(1) use a quarantine directory, what is the
   recommended way to achieve this? Is this by calling a subprocess?
   Maybe git-fetch-pack(1)?

--
Toon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Teach git-fetch(1) to use a quarantine directory
  2023-07-17 10:48 Teach git-fetch(1) to use a quarantine directory Toon Claes
@ 2023-07-17 14:06 ` Taylor Blau
  2023-07-19  7:24   ` Patrick Steinhardt
  2023-07-17 23:47 ` Jonathan Tan
  2023-07-17 23:56 ` Junio C Hamano
  2 siblings, 1 reply; 5+ messages in thread
From: Taylor Blau @ 2023-07-17 14:06 UTC (permalink / raw)
  To: Toon Claes; +Cc: git

On Mon, Jul 17, 2023 at 12:48:17PM +0200, Toon Claes wrote:
> Hi,
>
> I've been looking into making git-fetch(1) to use a quarantine
> directory, but I'm a bit stuck on direction.

What are you hoping to accomplish? receive-pack quarantines its objects
to ensure that the pre-receive hook(s) are all OK before accepting the
push. See 722ff7f876c (receive-pack: quarantine objects until
pre-receive accepts, 2016-10-03) for more of the details there.

Are you suggesting that fetch be taught the same, so that we can
quarantine the pack sent from a remote before moving it into the main
repository?

> I took git-receive-pack(1) as an example how it uses a quarantine
> directory. It seems it sets the environment variables
> $GIT_OBJECT_DIRECTORY and $GIT_ALTERNATE_OBJECT_DIRECTORIES so the real
> object db is used as an alternative, and a temporary is set as the
> default. Then a sub-process is spawned to uses these. In case of
> git-receive-pack(1), it calls git-unpack-objects(1).
>
> At the moment git-fetch(1) does not spawn any similar subprocess, so if
> we want to take the same approach to use the quarantine, we'll need to
> split up that command.

That doesn't seem necessary, you can use `tmp_objdir_add_as_alternate()`
to register a temporary directory as an alternate. See the tmp-objdir.h
API for more convenience functions.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Teach git-fetch(1) to use a quarantine directory
  2023-07-17 10:48 Teach git-fetch(1) to use a quarantine directory Toon Claes
  2023-07-17 14:06 ` Taylor Blau
@ 2023-07-17 23:47 ` Jonathan Tan
  2023-07-17 23:56 ` Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Jonathan Tan @ 2023-07-17 23:47 UTC (permalink / raw)
  To: Toon Claes; +Cc: Jonathan Tan, git

Toon Claes <toon@iotcl.com> writes:
> Hi,
> 
> I've been looking into making git-fetch(1) to use a quarantine
> directory, but I'm a bit stuck on direction.

Thanks for looking into this!

> I took git-receive-pack(1) as an example how it uses a quarantine
> directory. It seems it sets the environment variables
> $GIT_OBJECT_DIRECTORY and $GIT_ALTERNATE_OBJECT_DIRECTORIES so the real
> object db is used as an alternative, and a temporary is set as the
> default. Then a sub-process is spawned to uses these. In case of
> git-receive-pack(1), it calls git-unpack-objects(1).

From a reading of the source code, it may call git-unpack-objects or
git-index-pack, I think. (unpack() in builtin/receive-pack.c)

> At the moment git-fetch(1) does not spawn any similar subprocess, so if
> we want to take the same approach to use the quarantine, we'll need to
> split up that command.

It calls get_pack() in fetch-pack.c, which actually may call one of the
two commands above in the same way. (The calling may happen in various
ways. IIRC in fetch protocol v2, get_pack() is invoked directly by git-
fetch, whereas in earlier protocol versions, it may be the remote helper
that calls git-fetch-pack that invokes this function.)

> But then we run into another problem as well. git-fetch(1) updates
> references, and that is something that's not allowed when using a tmp
> object directory.

execute_commands() in builtin/receive-pack.c has this:

          /*                                                                                                                          
           * Now we'll start writing out refs, which means the objects need                                                           
           * to be in their final positions so that other processes can see them.                                                     
           */                                                                                                                         
          if (tmp_objdir_migrate(tmp_objdir) < 0) {     

which seems to be a solution to the same problem. I think Taylor wrote
something similar [1].

[1] https://lore.kernel.org/git/ZLVK5nzVZU48uvYE@nand.local/

> As far as I can tell from the code, fetching packs and updating refs is
> heavily intertwined, so I'm not sure this approach is the best way
> forward. So a few questions:
> 
> 1) Does it even make sense to make use git-fetch(1) use a quarantine
>    directory?

I don't know off-hand if this will work, but this sounds promising.

> 2) When making git-fetch(1) use a quarantine directory, what is the
>    recommended way to achieve this? Is this by calling a subprocess?
>    Maybe git-fetch-pack(1)?

I also don't know off-hand if this will work, but replicating what git-
receive-pack does makes sense to me.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Teach git-fetch(1) to use a quarantine directory
  2023-07-17 10:48 Teach git-fetch(1) to use a quarantine directory Toon Claes
  2023-07-17 14:06 ` Taylor Blau
  2023-07-17 23:47 ` Jonathan Tan
@ 2023-07-17 23:56 ` Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2023-07-17 23:56 UTC (permalink / raw)
  To: Toon Claes; +Cc: git

Toon Claes <toon@iotcl.com> writes:

> 1) Does it even make sense to make use git-fetch(1) use a quarantine
>    directory?

What's the goal here?  If the push gets rejected, remove the
incoming packfile data together with quarantine repository?  Who
will be doing such a rejection and how?  Does something run in the
quarantine repository as its main repository while using the final
one as an alternate (presumably some form of hooks) to make that
decision?  If these goals do not make sense, then no, it does not
make sense to teach "fetch" to use a quarantine directory.
Otherwise, teaching "fetch" would be a reasonable way to go forward.

It is a separate issue if the mechanisms added (if any) when we
tought receive-pack to use a quarantine repository are directly
applicable, if if they need some twaking, in order to reuse them in
the context of "git fetch".  But their needing some tweaking does
not mean that it does not make sense to try teaching "fetch" to use
quarantine repository.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Teach git-fetch(1) to use a quarantine directory
  2023-07-17 14:06 ` Taylor Blau
@ 2023-07-19  7:24   ` Patrick Steinhardt
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2023-07-19  7:24 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Toon Claes, git

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On Mon, Jul 17, 2023 at 10:06:30AM -0400, Taylor Blau wrote:
> On Mon, Jul 17, 2023 at 12:48:17PM +0200, Toon Claes wrote:
> > Hi,
> >
> > I've been looking into making git-fetch(1) to use a quarantine
> > directory, but I'm a bit stuck on direction.
> 
> What are you hoping to accomplish? receive-pack quarantines its objects
> to ensure that the pre-receive hook(s) are all OK before accepting the
> push. See 722ff7f876c (receive-pack: quarantine objects until
> pre-receive accepts, 2016-10-03) for more of the details there.

> Are you suggesting that fetch be taught the same, so that we can
> quarantine the pack sent from a remote before moving it into the main
> repository?

Yes. If we quarantine received objects it becomes easier to reject them
in case it's determined that they are bogus in any way. 

git-fetch(1) already does perform some checks, namely it will fsck
objects when `fetch.fsckObjects` is enabled. I would consider it an
improvement by itself already if the incoming objects were quarantined
and discarded if we notice that any of them are corrupt. Right now, they
will end up in the repository even if the consistency check declares
them as broken. This makes the whole option less useful in my opinion.

Also, it makes it easier to prune incoming objects in case Git crashes
or gets killed abnormally, e.g. due to a server crash. Mind you, it's
not a perfect solution as the operation may be killed after objects have
already been migrated or midway through. But it at least makes it easier
to clean things up a little bit.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-19  7:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-17 10:48 Teach git-fetch(1) to use a quarantine directory Toon Claes
2023-07-17 14:06 ` Taylor Blau
2023-07-19  7:24   ` Patrick Steinhardt
2023-07-17 23:47 ` Jonathan Tan
2023-07-17 23:56 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).