From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH 13/16] prune: keep objects reachable from recent objects
Date: Fri, 3 Oct 2014 20:09:02 -0400 [thread overview]
Message-ID: <20141004000901.GB17063@peff.net> (raw)
In-Reply-To: <xmqq1tqolt9u.fsf@gitster.dls.corp.google.com>
On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote:
> Sorry but this part is beyond a simple panda brain.
I probably didn't explain it very well. I found it rather confusing to
reason about. Let's see if we can go through it together.
> I can understand this
>
> If we have an object, even if it is unreachable, we
> should have its referent.
>
> as a description of the desired behaviour. If we have a tree that
> is unreachable, we must make sure that we do not discard blobs that
> are reachable from that tree, or we would end up corrupting our
> repository if we ever allow that tree to become reachable from our
> refs later.
Yes, exactly.
> > - if we are creating new objects, then we cannot create
> > the parent object without having the child
>
> We cannot create the parent (e.g. "tree") without having the child
> (e.g. "blob that is referred to by the tree we are creating").
> So this bullet point is repeating the same thing?
Sort of. The first statement "if we have an object, we should have its
referent" is a desired state. The bullet points are trying to reason
about the _changes_ in state, and argue that there are no state changes
that take us from a valid state to an invalid one.
There are two actions that affect the state of the object database:
adding objects and removing objects.
We cannot go from a valid state to an invalid state by adding objects,
because we cannot create the parent without having the child. That is
already the case before this patch (though as noted, you can "cheat"
if you know the sha1 from another repository and bypass git's safety
checks, but I do not think that is a case worth worrying about).
> > - and if we are pruning objects, will not prune the child
> > if we are keeping the parent
>
> We will not prune "blob" that are reachable from a "tree" that we
> are not yet ready to prune. So this again is repeating the same
> thing?
This is the other action. When removing objects, we will keep the child
along with the parent, and therefore we cannot move to an invalid state.
That's the part that this patch does.
> With this patch applied, the system will not prune unreachable old
> objects that are reachable from a recent object (the recent object
> itself may or may not be reachable but that does not make any
> difference). And that is sufficient to ensure the integrity of the
> repository even if you allow new objects to be created reusing any
> of these unreachable objects that are left behind by prune, because
> the reachability check done during prune (with this patch applied)
> makes sure any object left in the repository can safely be used as a
> starting point of connectivity traversal.
Yes, exactly.
> Ok, I think I got it now, but then do we still need to utime(2) the
> loose object files for unreachable objects that are referenced by
> a recent object (which is done in a later patch), or is that purely
> an optimization for the next round of gc where you would have more
> recent objects (i.e. you do not have to traverse to find out an old
> one is reachable from a new one, as there will be fewer old ones)?
No, we don't need to utime() the loose objects. As long as there is a
recently-written object that refers to them, they are considered worth
keeping.
The later patch that calls utime() on objects is really about defeating
the write_sha1_file optimization. That is, you might _think_ you have
written a recent object that refers to other objects, but the sha1-file
code silently turned it into a noop.
Does that make more sense?
-Peff
next prev parent reply other threads:[~2014-10-04 0:09 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-03 20:20 [PATCH 0/16] make prune mtime-checking more careful Jeff King
2014-10-03 20:21 ` [PATCH 01/16] foreach_alt_odb: propagate return value from callback Jeff King
2014-10-03 22:55 ` René Scharfe
2014-10-04 0:31 ` Jeff King
2014-10-03 20:22 ` [PATCH 02/16] isxdigit: cast input to unsigned char Jeff King
2014-10-03 20:22 ` [PATCH 03/16] object_array: factor out slopbuf-freeing logic Jeff King
2014-10-07 11:25 ` Michael Haggerty
2014-10-08 7:36 ` Jeff King
2014-10-08 8:40 ` Michael Haggerty
2014-10-08 8:55 ` Jeff King
2014-10-03 20:22 ` [PATCH 04/16] object_array: add a "clear" function Jeff King
2014-10-03 20:23 ` [PATCH 05/16] clean up name allocation in prepare_revision_walk Jeff King
2014-10-03 20:24 ` [PATCH 06/16] reachable: clear pending array after walking it Jeff King
2014-10-03 20:25 ` [PATCH 07/16] t5304: use test_path_is_* instead of "test -f" Jeff King
2014-10-03 22:12 ` Junio C Hamano
2014-10-03 20:27 ` [PATCH 08/16] t5304: use helper to report failure of "test foo = bar" Jeff King
2014-10-03 22:17 ` Junio C Hamano
2014-10-04 0:13 ` Jeff King
2014-10-07 13:21 ` Michael Haggerty
2014-10-07 17:29 ` Junio C Hamano
2014-10-07 20:18 ` Jeff King
2014-10-07 20:35 ` Junio C Hamano
2014-10-07 21:29 ` Jeff King
2014-10-07 21:53 ` Junio C Hamano
2014-10-07 22:17 ` Michael Haggerty
2014-10-08 1:13 ` Jeff King
2014-10-08 16:58 ` Junio C Hamano
2014-10-07 21:16 ` Junio C Hamano
2014-10-03 20:29 ` [PATCH 09/16] prune: factor out loose-object directory traversal Jeff King
2014-10-03 22:19 ` Junio C Hamano
2014-10-04 0:24 ` Jeff King
2014-10-07 14:07 ` Michael Haggerty
2014-10-08 7:33 ` Jeff King
2014-10-03 20:31 ` [PATCH 10/16] count-objects: do not use xsize_t when counting object size Jeff King
2014-10-03 20:31 ` [PATCH 11/16] count-objects: use for_each_loose_file_in_objdir Jeff King
2014-10-03 20:32 ` [PATCH 12/16] sha1_file: add for_each iterators for loose and packed objects Jeff King
2014-10-05 8:15 ` René Scharfe
2014-10-05 10:47 ` Ramsay Jones
2014-10-03 20:39 ` [PATCH 13/16] prune: keep objects reachable from recent objects Jeff King
2014-10-03 21:47 ` Junio C Hamano
2014-10-04 0:09 ` Jeff King [this message]
2014-10-04 0:30 ` Jeff King
2014-10-04 3:04 ` Junio C Hamano
2014-10-07 16:29 ` Michael Haggerty
2014-10-08 7:19 ` Jeff King
2014-10-08 10:37 ` Michael Haggerty
2014-10-03 20:39 ` [PATCH 14/16] pack-objects: refactor unpack-unreachable expiration check Jeff King
2014-10-03 20:40 ` [PATCH 15/16] pack-objects: match prune logic for discarding objects Jeff King
2014-10-03 20:41 ` [PATCH 16/16] write_sha1_file: freshen existing objects Jeff King
2014-10-03 21:29 ` Junio C Hamano
2014-10-04 0:01 ` Jeff King
2014-10-05 9:12 ` René Scharfe
2014-10-03 22:20 ` [PATCH 0/16] make prune mtime-checking more careful Junio C Hamano
2014-10-04 22:22 ` Junio C Hamano
2014-10-05 9:19 ` René Scharfe
2014-10-06 1:42 ` Jeff King
2014-10-08 8:31 ` Jeff King
2014-10-08 17:03 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141004000901.GB17063@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).