From: Junio C Hamano <gitster@pobox.com>
To: Ben Peart <peartben@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
christian.couder@gmail.com
Subject: Re: Partial clone design (with connectivity check for locally-created objects)
Date: Mon, 07 Aug 2017 12:41:23 -0700 [thread overview]
Message-ID: <xmqqwp6fp3mk.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <0633771f-ce19-6211-fabe-3f7f676e53ab@gmail.com> (Ben Peart's message of "Mon, 7 Aug 2017 15:12:11 -0400")
Ben Peart <peartben@gmail.com> writes:
> My concern with this proposal is the combination of 1) writing a new
> pack file for every git command that ends up bringing down a missing
> object and 2) gc not compressing those pack files into a single pack
> file.
Your noticing these is a sign that you read the outline of the
design correctly, I think.
The basic idea is that the local fsck should tolerate missing
objects when they are known to be obtainable from that external
service, but should still be able to diagnose missing objects that
we do not know if the external service has, especially the ones that
have been newly created locally and not yet made available to them
by pushing them back.
So we need a way to tell if an object that we do not have (but we
know about) can later be obtained from the external service.
Maintaining an explicit list of such objects obviously is one way,
but we can get the moral equivalent by using pack files. After
receiving a pack file that has a commit from such an external
service, if the commit refers to its parent commit that we do not
have locally, the design proposes us to consider that the parent
commit that is missing is available at the external service that
gave the pack to us. Similarly for missing trees, blobs, and any
objects that are supposed to be "reachable" from objects in such a
packfile.
We can extend the approach to cover loose objects if we wanted to;
just define an alternate object store used internally for this
purpose and drop loose objects obtained from such an external
service in that object store.
Because we do not want to leave too many loose objects and small
packfiles lying around, we will need a new way of packing these.
Just enumerate these objects known to have come from the external
service (by being in packfiles marked as such or being loose objects
in the dedicated alternate object store), and create a single larger
packfile, which is marked as "holding the objects that are known to
be in the external service". We do not have such a mode of gc, and
that is a new development that needs to happen, but we know that is
doable.
> That thinking did lead me back to wondering again if we could live
> with a repo specific flag. If any clone/fetch was "partial" the flag
> is set and fsck ignore missing objects whether they came from a
> "partial" remote or not.
The only reason people run "git fsck" is to make sure that their
local repository is sound and they can rely on the objects you have
as the base of building new stuff on top of. That is why we are
trying to find a way to make sure "fsck" can be used to detect
broken or missing objects that cannot be obtained from the
lazy-object store, without incurring undue overhead for normal
codepath (i.e. outside fsck).
It is OK to go back to wondering again, but I think that essentially
tosses "git fsck" out of the window and declares that it is OK to
hope that local objects will never go bad. We can make such an
declaration anytime, but I do not want to see us doing so without
first trying to solve the issue without punting.
next prev parent reply other threads:[~2017-08-07 19:41 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-04 21:51 Partial clone design (with connectivity check for locally-created objects) Jonathan Tan
2017-08-04 22:51 ` Junio C Hamano
2017-08-05 0:21 ` Jonathan Tan
2017-08-07 19:12 ` Ben Peart
2017-08-07 19:21 ` Jonathan Nieder
2017-08-08 14:18 ` Ben Peart
2017-08-07 19:41 ` Junio C Hamano [this message]
2017-08-08 16:45 ` Ben Peart
2017-08-08 17:03 ` Jonathan Nieder
2017-08-07 23:10 ` Jonathan Tan
2017-08-16 0:32 ` [RFC PATCH] Updated "imported object" design Jonathan Tan
2017-08-16 20:32 ` Junio C Hamano
2017-08-16 21:35 ` Jonathan Tan
2017-08-17 20:50 ` Ben Peart
2017-08-17 21:39 ` Jonathan Tan
2017-08-18 14:18 ` Ben Peart
2017-08-18 23:33 ` Jonathan Tan
2017-08-17 20:07 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqwp6fp3mk.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peartben@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.