From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/25] prune-safety
Date: Wed, 15 Oct 2014 18:32:44 -0400 [thread overview]
Message-ID: <20141015223244.GA25368@peff.net> (raw)
Here's a re-roll of the patch series I posted earlier to make "git
prune" keep more contiguous chunks of the object graph. The cleanups to
t5304 were spun off into their own series, and are dropped here.
However, the other patches seem to have multiplied in number (I must
have fed them after midnight).
Here are the changes since the first round (thanks everybody for your
comments):
- fix bogus return values from freshen_file, foreach_alt_odb, and
for_each_packed_object
- make for_each_object_in_pack static
- clarify commit message for "keep objects reachable from recent
objects" patch (this was the one that confused Junio, and I
elaborated based on our discussion)
- clarify the definition of "loose object dirs" in the comment above
for_each_loose_file_in_object_dir
- in for_each_loose_file, traverse hashed loose object directories in
numeric order, and pass the number to the subdir callback (this is
used by prune-packed for its progress updates); as a side effect,
this fixes the bugs Michael noticed with the subdir callback.
- prune-packed now reuses the for_each_loose_file interface
- use revs->ignore_missing_links so we don't barf on already-missing
unreferenced objects
- convert reachable.c to use traverse_commit_list instead of its own
custom walk; this gives support for ignore_missing_links above, and
saves us a fair bit of code.
- while in the area, I noticed that reachable.c's reflog handling is
the same as rev-list's --reflog option; it now builds on what's in
revision.c.
That takes us up to patch 17. While working in reachable.c, I noticed an
oddity: we consider objects in the index to be reachable during prune
(which is good), but we do not when dropping them during a repack that
uses --unpack-unreachable=<expiration>. The remaining patches fix that,
which needed a fair bit of preparatory cleanup.
I'm really beginning to question whether the "just drop objects that are
about to be pruned" optimization done in 7e52f56 (gc: do not explode
objects which will be immediately pruned, 2012-04-07). It really
complicates things as pack-objects and prune need to have the exact same
rules (and implementing it naively, by having pack-objects run the same
code as prune, is not desirable because pack-objects has _already_ done
a complete expensive traversal to generate the packing list). And I fear
it will get even worse if we implement some of the race-condition fixes
that Michael suggested earlier.
On the other hand, the loosening behavior without 7e52f56 has some
severe pathological cases. A repository which has had a chunk of history
deleted can easily increase in size several orders of magnitude due to
loosening (since we lose the benefit of all deltas in the loosened
objects).
Finally, there are a few things that were discussed that I didn't
address/fix. I don't think any of them is a critical blocker, but I
did want to summarize the state:
- when refreshing, we may update a pack's mtime multiple times. It
probably wouldn't be too hard to cache this and only update once per
program run, but I also don't think it's that big a deal in
practice.
- We will munge mtimes of objects found in alternates. If we don't
have write access to the alternate, we'll create a local duplicate
of the object. This is the safer thing, but I'm not sure if there
are cases where we might try to write out a large number of objects
which exist in an alternate (OTOH, we will eventually drop them at
the next repack).
- I didn't implement the "sort by inode" trick that fsck does when
traversing the loose objects. It wouldn't be too hard, but I'm not
convinced it's actually important.
- I didn't convert fsck to the for_each_loose_file interface (mostly
because I didn't do the inode-sorting trick, and while I don't think
it matters, I didn't go to the work to show that it _doesn't_).
Here are the patches:
[01/25]: foreach_alt_odb: propagate return value from callback
[02/25]: isxdigit: cast input to unsigned char
[03/25]: object_array: factor out slopbuf-freeing logic
[04/25]: object_array: add a "clear" function
[05/25]: clean up name allocation in prepare_revision_walk
[06/25]: reachable: use traverse_commit_list instead of custom walk
[07/25]: reachable: reuse revision.c "add all reflogs" code
[08/25]: prune: factor out loose-object directory traversal
[09/25]: reachable: mark index blobs as SEEN
[10/25]: prune-packed: use for_each_loose_file_in_objdir
[11/25]: count-objects: do not use xsize_t when counting object size
[12/25]: count-objects: use for_each_loose_file_in_objdir
[13/25]: sha1_file: add for_each iterators for loose and packed objects
[14/25]: prune: keep objects reachable from recent objects
[15/25]: pack-objects: refactor unpack-unreachable expiration check
[16/25]: pack-objects: match prune logic for discarding objects
[17/25]: write_sha1_file: freshen existing objects
[18/25]: make add_object_array_with_context interface more sane
[19/25]: traverse_commit_list: support pending blobs/trees with paths
[20/25]: rev-list: document --reflog option
[21/25]: rev-list: add --index-objects option
[22/25]: reachable: use revision machinery's --index-objects code
[23/25]: pack-objects: use argv_array
[24/25]: repack: pack objects mentioned by the index
[25/25]: pack-objects: double-check options before discarding objects
next reply other threads:[~2014-10-15 22:32 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-15 22:32 Jeff King [this message]
2014-10-15 22:33 ` [PATCH v2 01/25] foreach_alt_odb: propagate return value from callback Jeff King
2014-10-15 22:34 ` [PATCH v2 02/25] isxdigit: cast input to unsigned char Jeff King
2014-10-16 17:16 ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 03/25] object_array: factor out slopbuf-freeing logic Jeff King
2014-10-16 17:39 ` Junio C Hamano
2014-10-17 0:33 ` git-bundle rev handling and de-duping Jeff King
2014-10-17 21:03 ` Philip Oakley
2014-10-17 22:41 ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 04/25] object_array: add a "clear" function Jeff King
2014-10-15 22:35 ` [PATCH v2 05/25] clean up name allocation in prepare_revision_walk Jeff King
2014-10-15 22:37 ` [PATCH v2 06/25] reachable: use traverse_commit_list instead of custom walk Jeff King
2014-10-16 17:53 ` Junio C Hamano
2014-10-15 22:38 ` [PATCH v2 07/25] reachable: reuse revision.c "add all reflogs" code Jeff King
2014-10-15 22:38 ` [PATCH v2 08/25] prune: factor out loose-object directory traversal Jeff King
2014-10-15 22:40 ` [PATCH v2 09/25] reachable: mark index blobs as SEEN Jeff King
2014-10-15 22:40 ` [PATCH v2 10/25] prune-packed: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:40 ` [PATCH v2 11/25] count-objects: do not use xsize_t when counting object size Jeff King
2014-10-15 22:41 ` [PATCH v2 12/25] count-objects: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:41 ` [PATCH v2 13/25] sha1_file: add for_each iterators for loose and packed objects Jeff King
2014-10-15 22:41 ` [PATCH v2 14/25] prune: keep objects reachable from recent objects Jeff King
2014-10-15 22:41 ` [PATCH v2 15/25] pack-objects: refactor unpack-unreachable expiration check Jeff King
2014-10-15 22:42 ` [PATCH v2 16/25] pack-objects: match prune logic for discarding objects Jeff King
2014-10-15 22:42 ` [PATCH v2 17/25] write_sha1_file: freshen existing objects Jeff King
2014-10-15 22:42 ` [PATCH v2 18/25] make add_object_array_with_context interface more sane Jeff King
2014-10-15 22:43 ` [PATCH v2 19/25] traverse_commit_list: support pending blobs/trees with paths Jeff King
2014-10-15 22:43 ` [PATCH v2 20/25] rev-list: document --reflog option Jeff King
2014-10-15 22:44 ` [PATCH v2 21/25] rev-list: add --index-objects option Jeff King
2014-10-16 18:41 ` Junio C Hamano
2014-10-17 0:12 ` Jeff King
2014-10-17 0:43 ` Jeff King
2014-10-17 0:44 ` [PATCH v3 22/26] rev-list: add --indexed-objects option Jeff King
2014-10-17 0:44 ` [PATCH v3 23/26] reachable: use revision machinery's --indexed-objects code Jeff King
2014-10-17 0:44 ` [PATCH v3 24/26] pack-objects: use argv_array Jeff King
2014-10-17 0:44 ` [PATCH v3 25/26] repack: pack objects mentioned by the index Jeff King
2014-10-17 0:44 ` [PATCH v3 26/26] pack-objects: double-check options before discarding objects Jeff King
2014-10-15 22:44 ` [PATCH v2 22/25] reachable: use revision machinery's --index-objects code Jeff King
2014-10-15 22:45 ` [PATCH v2 23/25] pack-objects: use argv_array Jeff King
2014-10-15 22:46 ` [PATCH v2 24/25] repack: pack objects mentioned by the index Jeff King
2014-10-15 22:48 ` [PATCH v2 25/25] pack-objects: double-check options before discarding objects Jeff King
2014-10-16 21:07 ` [PATCH v2 0/25] prune-safety Junio C Hamano
2014-10-16 21:10 ` Junio C Hamano
2014-10-16 21:21 ` Jeff King
2014-10-16 21:39 ` Jeff King
2014-10-16 22:18 ` Junio C Hamano
2014-10-17 0:03 ` Jeff King
[not found] ` <CAPc5daX0AFv9jDrFyd_OnupW5AfZW9Je_rgzaViX_xxs3SG5zg@mail.gmail.com>
2014-10-17 4:49 ` Jeff King
2014-10-18 12:31 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141015223244.GA25368@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).