Possible to update-ref remote repository?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Possible to update-ref remote repository?
@ 2023-09-30 16:34 Jesse Hopkins
  2023-09-30 17:17 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse Hopkins @ 2023-09-30 16:34 UTC (permalink / raw)
  To: git

Hello -

Wondering if it's possible to do the equivalent of the update-ref
command remotely.  Or I guess another way of putting it would be to
git-push to a remote repository without needing a local clone of the
repo.

Trying do something like:

git push <remote-repo-url>  <sha1>:refs/heads/mybranchtoupdate

where I know that <sha1> already exists on the remote.  I'd like to
avoid the need to clone a local copy of the repo.  Wondering if there
might be some plumbing command(s) that could accomplish this?

Regards,
Jesse

P.S. I think think a minimal way to do this using git push would be:

git init
git remote add origin <remote-repo-url>
git fetch origin <sha1>:refs/heads/mybranchtoupdate
git push origin refs/heads/mybranchtoupdate:refs/heads/mybranchtoupdate

Seems that if it's known that <sha1> already exists on the remote, the
fetch is unnecessary network overhead?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible to update-ref remote repository?
  2023-09-30 16:34 Possible to update-ref remote repository? Jesse Hopkins
@ 2023-09-30 17:17 ` Junio C Hamano
  2023-10-01 22:03   ` Jesse Hopkins
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2023-09-30 17:17 UTC (permalink / raw)
  To: Jesse Hopkins; +Cc: git

Jesse Hopkins <jesse.hops@gmail.com> writes:

> Wondering if it's possible to do the equivalent of the update-ref
> command remotely.  Or I guess another way of putting it would be to
> git-push to a remote repository without needing a local clone of the
> repo.
>
> Trying do something like:
>
> git push <remote-repo-url>  <sha1>:refs/heads/mybranchtoupdate
>
> where I know that <sha1> already exists on the remote.  I'd like to
> avoid the need to clone a local copy of the repo.  Wondering if there
> might be some plumbing command(s) that could accomplish this?

There is no such command shipped with Git.  I do not think anybody
ever proposed to add such a feature, as it won't be generally useful
in contexts other than one-shot surgery.

But that is different from that it is impossible to do.  At the
protocol level, as long as the receiving end is convinced that such
a push, which does not transfer any object but proposes to update a
ref to a new value, does not corrupt the resulting repository state
after acceptingit, the receiving end does not care if the sending
end has what object, or even any repository---the receiving end
cannot even know what the sending end actually has.

The devil however is in the details of ensuring the "as long as the
receiving end is convinced" part.  If you know that an object whose
object name is X is sitting at the tip of branch A at the remote,
and you try to update the tip of branch B to the same object, it is
likely [*] that the remote would notice that such an update after
receiving no new objects is safe.  But if the object X is not
sitting at the tip of any branch or ref (it may be a few commits
behind an existing ref, or it may be dangling ahead of all refs), it
would depend on how the receiving end determines that it has object
chains necessary to complete the new DAG that has X.  The official
versions of Git client historically have done a thorough job in
check_connected() to even discover fully connected object chains
that allows us to resurrect such a dangling tip of a DAG, but we
cannot complain if third-party Git implementations misses less
obvious cases.

Having said all that, writing a specialized "push lookalike" that
sends the required protocol message that would have been spewed by a
real "git push" client operating in the right environment (i.e., it
has a repository that is a good clone of the receiving repository at
the <remote-repo-url>, the receiving repository has <sha1> and all
the objects that are reachable from it, refs/heads/mybranchtoupdate
points at a commit that is an ancestor of <sha1> right now) should
not be brain surgery, and with such a program you can

 $ git-push-empty <remote-repo-url> <sha1>:refs/heads/mybranchtoupdate

and fool the receiving end to do what you want it to do, as it
cannot tell if it is talking to a real "git push" client or to your
"push-empty" program from what is coming over the wire.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible to update-ref remote repository?
  2023-09-30 17:17 ` Junio C Hamano
@ 2023-10-01 22:03   ` Jesse Hopkins
  2023-10-03 20:00     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse Hopkins @ 2023-10-01 22:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thanks for the reply.  Would you be able to point me to some
breadcrumbs for the "required protocol messages"?  I might try to
tinker in some spare time.

FWIW, I had put something like this together a while back using
Gitolite hosting (a dedicated SSH command similar to Gitolite's
symbolic-ref command:
https://github.com/sitaramc/gitolite/blob/master/src/commands/symbolic-ref).

Our org is using Gitlab now, and I have been able to put some
functionality together using the Gitlab API's, but it's quite ugly,
and was hoping that maybe there was an intrinsic git protocol
solution, which seems there could be, but somewhere past trivial and
before impossible.

-Jesse

On Sat, Sep 30, 2023 at 11:17 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jesse Hopkins <jesse.hops@gmail.com> writes:
>
> > Wondering if it's possible to do the equivalent of the update-ref
> > command remotely.  Or I guess another way of putting it would be to
> > git-push to a remote repository without needing a local clone of the
> > repo.
> >
> > Trying do something like:
> >
> > git push <remote-repo-url>  <sha1>:refs/heads/mybranchtoupdate
> >
> > where I know that <sha1> already exists on the remote.  I'd like to
> > avoid the need to clone a local copy of the repo.  Wondering if there
> > might be some plumbing command(s) that could accomplish this?
>
> There is no such command shipped with Git.  I do not think anybody
> ever proposed to add such a feature, as it won't be generally useful
> in contexts other than one-shot surgery.
>
> But that is different from that it is impossible to do.  At the
> protocol level, as long as the receiving end is convinced that such
> a push, which does not transfer any object but proposes to update a
> ref to a new value, does not corrupt the resulting repository state
> after acceptingit, the receiving end does not care if the sending
> end has what object, or even any repository---the receiving end
> cannot even know what the sending end actually has.
>
> The devil however is in the details of ensuring the "as long as the
> receiving end is convinced" part.  If you know that an object whose
> object name is X is sitting at the tip of branch A at the remote,
> and you try to update the tip of branch B to the same object, it is
> likely [*] that the remote would notice that such an update after
> receiving no new objects is safe.  But if the object X is not
> sitting at the tip of any branch or ref (it may be a few commits
> behind an existing ref, or it may be dangling ahead of all refs), it
> would depend on how the receiving end determines that it has object
> chains necessary to complete the new DAG that has X.  The official
> versions of Git client historically have done a thorough job in
> check_connected() to even discover fully connected object chains
> that allows us to resurrect such a dangling tip of a DAG, but we
> cannot complain if third-party Git implementations misses less
> obvious cases.
>
> Having said all that, writing a specialized "push lookalike" that
> sends the required protocol message that would have been spewed by a
> real "git push" client operating in the right environment (i.e., it
> has a repository that is a good clone of the receiving repository at
> the <remote-repo-url>, the receiving repository has <sha1> and all
> the objects that are reachable from it, refs/heads/mybranchtoupdate
> points at a commit that is an ancestor of <sha1> right now) should
> not be brain surgery, and with such a program you can
>
>  $ git-push-empty <remote-repo-url> <sha1>:refs/heads/mybranchtoupdate
>
> and fool the receiving end to do what you want it to do, as it
> cannot tell if it is talking to a real "git push" client or to your
> "push-empty" program from what is coming over the wire.
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible to update-ref remote repository?
  2023-10-01 22:03   ` Jesse Hopkins
@ 2023-10-03 20:00     ` Jeff King
  2023-10-04  1:13       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2023-10-03 20:00 UTC (permalink / raw)
  To: Jesse Hopkins; +Cc: Junio C Hamano, git

On Sun, Oct 01, 2023 at 04:03:29PM -0600, Jesse Hopkins wrote:

> Thanks for the reply.  Would you be able to point me to some
> breadcrumbs for the "required protocol messages"?  I might try to
> tinker in some spare time.

Try "git help protocol-pack" in a recent version of Git (this used to be
in Documentation/technical/ of the repository, but much of that content
was moved into manpages around the v2.38 timeframe).

For a local or ssh connection, I think it is as simple as:

  # you somehow happen to know this commit exists on the server,
  # and what the current value of the ref is. If you don't know the
  # current value, you can pull it from receive-pack's ref
  # advertisement (I'll leave that as an exercise for the reader).
  old=1234abcd...
  new=5678cdef...
  ref=refs/heads/main

  # we'll use a local repository here, but you can replace receive-pack
  # invocation below with with "ssh $host git receive-pack $repo"
  repo=/path/to/repo.git

  {
    # git's pkt-line format is a 4-byte header with the ascii hex size of
    # the packet, followed by N-4 bytes of data. Each ref update is
    # in its own pkt, but we have just one.
    cmd="$old $new $ref"
    printf "%04x%s" $((${#cmd} + 4)) "$cmd"

    # An all-zero flush packet indicates the end of the list of updates.
    printf "0000"

    # the server insists that we send a valid packfile, even if it is
    # empty. This is from "git help format-pack" (the section on .pack
    # files), though you could also generate it with "git pack-objects
    # --stdout </dev/null".
    printf 'PACK' ;# packfile
    printf '\0\0\0\2' ;# version 2
    printf '\0\0\0\0' ;# zero objects
    # checksum, which is the sha1 of the rest of the pack
    printf "\2\235\10\202\73\330\250\352\265\20\255\152\307\134\202\74\375\76\323\36"
  } |
  git receive-pack "$repo"

You can get fancier by specifying capabilities (you might want
"report-status", for example).

That will work for local or ssh repos. For http, it gets a little more
complicated. See the section "smart service git-receive-pack" of "git
help protocol-http".

All that said, I do think it might be reasonable for git-push to support
this directly. It is basically:

  1. Let the command run in a non-repo, skipping anything that requires
     it. This _might_ be a maintenance headache, but as a fallback you
     could always run from an empty local repository.

  2. Tell it to always generate an empty pack (basically, a "trust me,
     the other side will be OK with it" option).

The second part looks like something like this:

diff --git a/send-pack.c b/send-pack.c
index 89aca9d829..c54463c181 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -58,6 +58,9 @@ static void feed_object(const struct object_id *oid, FILE *fh, int negative)
 	putc('\n', fh);
 }
 
+/* obviously this should be passed down somehow in a real patch */
+#define SPECIAL_EMPTY_PACK_OPTION 1
+
 /*
  * Make a pack stream and spit it out into file descriptor fd
  */
@@ -103,17 +106,19 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
 	 * parameters by writing to the pipe.
 	 */
 	po_in = xfdopen(po.in, "w");
-	for (i = 0; i < advertised->nr; i++)
-		feed_object(&advertised->oid[i], po_in, 1);
-	for (i = 0; i < negotiated->nr; i++)
-		feed_object(&negotiated->oid[i], po_in, 1);
-
-	while (refs) {
-		if (!is_null_oid(&refs->old_oid))
-			feed_object(&refs->old_oid, po_in, 1);
-		if (!is_null_oid(&refs->new_oid))
-			feed_object(&refs->new_oid, po_in, 0);
-		refs = refs->next;
+	if (!SPECIAL_EMPTY_PACK_OPTION) {
+		for (i = 0; i < advertised->nr; i++)
+			feed_object(&advertised->oid[i], po_in, 1);
+		for (i = 0; i < negotiated->nr; i++)
+			feed_object(&negotiated->oid[i], po_in, 1);
+
+		while (refs) {
+			if (!is_null_oid(&refs->old_oid))
+				feed_object(&refs->old_oid, po_in, 1);
+			if (!is_null_oid(&refs->new_oid))
+				feed_object(&refs->new_oid, po_in, 0);
+			refs = refs->next;
+		}
 	}
 
 	fflush(po_in);

Come to think of it, you could probably fake it by wrapping
git-pack-objects with a script that throws away its input. Maybe hard to
do because its a builtin (and we run it as "git pack-objects", which
executes it directly in-process).

-Peff

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Possible to update-ref remote repository?
  2023-10-03 20:00     ` Jeff King
@ 2023-10-04  1:13       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2023-10-04  1:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Jesse Hopkins, git

Jeff King <peff@peff.net> writes:

> All that said, I do think it might be reasonable for git-push to support
> this directly.

Yup.  It certainly is simpler if you can leverage existing helpers.

It will become even simpler in a reasonably modularlized world that
hopefully may materialize before we all retire ;-).  I am hoping
that some of the folks who are interested in and talking about
libification can be fooled into doing the necessary work to
introduce proper abstraction, in addition to whatever they are
doing.

Wouldn't it be great if you can have an in-core repository object,
that knows what its object store is, has an index_state object that
is tied to that object store, has a reference database whose values
point into the object store, and if you can choose and mix these
repository components' implementations?  If done right, parts of the
above set of components can be replaced with mock implementations
that are in-core only.

To run "git push --repoint-only there 01beef23:master", you should
be able to start your process totally outside an repository, yet
create an in-core-only repository instance with an in-core-only
object store instance, and because you took the object name to push
on the command line, your in-core object store can "lie" to a call
"create an in-core object for this SHA-1" by returning a fake
in-core commit object, and your in-core-only ref database has that
commit pointed at by some ref.  Then because higher level "client"
code to walk revisions, enumerate refs, etc., would all implement
what they need to do by calling vtable of these in-core objects, you
can do the "repoint-only" push without being in any repository, as
such an implementation would not touch any filesystem (you can then
plug in different implementation of object store etc., and even make
them perform reasonably well if you manage to do the abstraction
right).

But that would probably be at least 6 months away, even if we had a
handful of competent developers totally dedicated to the effort
without any distraction, which I do not know how likely it is to
happen.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-04  1:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-30 16:34 Possible to update-ref remote repository? Jesse Hopkins
2023-09-30 17:17 ` Junio C Hamano
2023-10-01 22:03   ` Jesse Hopkins
2023-10-03 20:00     ` Jeff King
2023-10-04  1:13       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).