git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Brandon Williams <bmwill@google.com>
Cc: Jeff Hostetler <git@jeffhostetler.com>,
	git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
Date: Mon, 11 Dec 2017 11:19:16 -0800	[thread overview]
Message-ID: <20171211111916.b3ea2deacba67f6e8416d285@google.com> (raw)
In-Reply-To: <20171208223010.GF140529@google.com>

On Fri, 8 Dec 2017 14:30:10 -0800
Brandon Williams <bmwill@google.com> wrote:

> I just finished reading through parts 1-3.  Overall I like the series.
> There are a few point's that I'm not a big fan of but i wasn't able to
> come up with a better alternative.  One of these being the need for a
> global variable to tell the fetch-object logic to not go to the server
> to try and fetch a missing object.

I didn't really like that approach too but I went with that because,
like you, I couldn't come up with a better one. The main issue is that
too many functions (e.g. parse_commit() in commit.c) indirectly read
objects, and I couldn't find a better way to control them all. Ideally,
we should have a "struct object_store" (or maybe "struct repository"
could do this too) on which we can set "fetch_if_missing", and have all
object-reading functions take a pointer to this struct. Or completely
separate the object-reading and object-parsing code (e.g. commit.c
should not be able to read objects at all). Or both.

Any of these would be major undertakings, though, and there are good
reasons for why the same function does the reading and parsing (for
example, parse_commit() does not perform any reading if the object has
been already parsed).

> One other thing i noticed was it looks like when you discover that you
> are missing a blob you you'll try to fault it in from the server without
> first checking its an object the server would even have.  Shouldn't you
> first do a check to verify that the object in question is a promised
> object before you go out to contact the server to request it?  You may
> have already ruled this out for some reason I'm not aware of (maybe its
> too costly to compute?).

It is quite costly to compute - in the worst case, we would need to read
every object in every promisor packfile of one or more certain types
(e.g. if we know that we're fetching a blob, we need to read every tree)
to find out if the object we want is a promisor object.

Such a check would be better at surfacing mistakes (e.g. the user giving
the wrong SHA-1) early, but beyond that, I don't think that having the
check is very important. Consider these two very common situations:

 (1) Fetching a single branch by its tip's SHA-1. A naive implementation
     will first check if we have that SHA-1, which triggers the dynamic
     fetch (since it is an object read), and assuming success, notice
     that we indeed have that tip, and not fetch anything else. The
     check you describe will avoid this situation.
 (2) Dynamically fetching a missing blob by its SHA-1. A naive
     implementation will first check if we have that SHA-1, which
     triggers the dynamic fetch, and that fetch will first check if we
     have that SHA-1, and so on (thus, an infinite loop). The check you
     describe will not avoid that situation.

The check solves (1), but we still need a solution to (2) - I used
"fetch_if_missing", as discussed in your previous question and my answer
to that. A solution to (2) is usually also a solution to (1), so the
check wouldn't help much here.

      reply	other threads:[~2017-12-11 19:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 01/16] sha1_file: support lazily fetching missing objects Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 02/16] rev-list: support termination at promisor objects Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 03/16] gc: do not repack promisor packfiles Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 04/16] upload-pack: add object filtering for partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 05/16] fetch-pack, index-pack, transport: " Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 06/16] fetch-pack: add --no-filter Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 07/16] fetch-pack: test support excluding large blobs Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 08/16] fetch: refactor calculation of remote list Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 09/16] fetch: support filters Jeff Hostetler
2018-08-19 11:24   ` Duy Nguyen
2018-08-20 19:42     ` Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 10/16] partial-clone: define partial clone settings in config Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 11/16] clone: partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 12/16] unpack-trees: batch fetching of missing blobs Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 14/16] t5616: end-to-end tests for partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 15/16] fetch: inherit filter-spec from " Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 16/16] t5616: test bulk prefetch after partial fetch Jeff Hostetler
2017-12-08 17:58 ` [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
2017-12-08 18:10   ` Jeff Hostetler
2017-12-08 18:23     ` Junio C Hamano
2017-12-08 22:30 ` Brandon Williams
2017-12-11 19:19   ` Jonathan Tan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171211111916.b3ea2deacba67f6e8416d285@google.com \
    --to=jonathantanmy@google.com \
    --cc=bmwill@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).