Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Tim Hockin <thockin@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: git fetch --dry-run changes the local repo and transfers data
Date: Tue, 3 Jan 2023 06:07:04 -0500	[thread overview]
Message-ID: <Y7QMWN86dDLXc4dZ@coredump.intra.peff.net> (raw)
In-Reply-To: <CAO_RewbVicTznpDeCDG0Uqng-MdQ_GKtp-Vgz8kmtaXoczQGOg@mail.gmail.com>

On Tue, Dec 27, 2022 at 10:42:25AM -0800, Tim Hockin wrote:

> Thanks.  What threw me is that I expected it to be "very fast" and
> "very cheap" . If I commit a multi-gig file on the server, I see the
> dry-run fetch takes several seconds (clearly indicating some work
> proportional to the server repo size).  I don't want to transfer that
> file on a dry-run, I hoped the server and client were both
> dry-running, andb the server could simply say "here's metadata for
> what I _would have_ returned if this was real".  Not possible?

No, the server has no notion of a dry run.

I think the best you could do with fetch is to ask for a smaller set of
objects. For example:

  git fetch --depth=1 --filter=tree:0 \
    https://github.com/git/git \
    2e71cbbddd64695d43383c25c7a054ac4ff86882

will grab a single object. You can even "git show -s 2e71cbbd" on the
result to see it (the "-s" is important to avoid it fetching the trees
to do a diff!). Two things to be aware of:

  - this may have some lingering effects in your repository, as the
    shallow and partial features store some metadata locally to make
    sense of the situation. You're probably best off doing it in a
    temporary repository.

  - not all servers will support --filter; it has to be enabled in the
    config.

There is potentially a more direct option, though. A while back, commit
a2ba162cda (object-info: support for retrieving object info, 2021-04-20)
added an extension that lets you get the size of an object on the
server. Unfortunately I don't think anybody ever wrote client-side
support. So you'd have to rig up something yourself like:

  # write git's packet format: 4-hex length followed by data
  pkt() {
    printf '%04x%s' "$((4+${#1}))" "$1"
  }

  # a sample input; you should be able to query multiple objects if you
  # want by adding more "oid" lines
  {
    pkt "command=object-info"
    printf "0001"
    pkt "size"
    pkt "oid 2e71cbbddd64695d43383c25c7a054ac4ff86882"
    printf "0000"
  } >input

  # this makes a local request; it's important we're in v2 mode, since
  # the extension only applies there. For http, I think you'd want
  # something like:
  #
  #  curl -H 'Git-Protocol: version=2' https://example.com/repo.git/git-upload-pack
  #
  # but I didn't test it.
  GIT_PROTOCOL=version=2 git-upload-pack /path/to/repo.git <input >output

I've left parsing the output as an exercise for the reader. But you
should be able to notice whether the object is present or not based on
the result.

Not all servers may support the extension. For example, I think GitHub's
servers have disabled it.

-Peff

      reply	other threads:[~2023-01-03 11:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-26 17:21 git fetch --dry-run changes the local repo and transfers data Tim Hockin
2022-12-27 12:52 ` Junio C Hamano
2022-12-27 18:42   ` Tim Hockin
2023-01-03 11:07     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7QMWN86dDLXc4dZ@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=thockin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox