git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/25] prune-safety
@ 2014-10-15 22:32 Jeff King
  2014-10-15 22:33 ` [PATCH v2 01/25] foreach_alt_odb: propagate return value from callback Jeff King
                   ` (25 more replies)
  0 siblings, 26 replies; 48+ messages in thread
From: Jeff King @ 2014-10-15 22:32 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Junio C Hamano

Here's a re-roll of the patch series I posted earlier to make "git
prune" keep more contiguous chunks of the object graph. The cleanups to
t5304 were spun off into their own series, and are dropped here.
However, the other patches seem to have multiplied in number (I must
have fed them after midnight).

Here are the changes since the first round (thanks everybody for your
comments):

  - fix bogus return values from freshen_file, foreach_alt_odb, and
    for_each_packed_object

  - make for_each_object_in_pack static

  - clarify commit message for "keep objects reachable from recent
    objects" patch (this was the one that confused Junio, and I
    elaborated based on our discussion)

  - clarify the definition of "loose object dirs" in the comment above
    for_each_loose_file_in_object_dir

  - in for_each_loose_file, traverse hashed loose object directories in
    numeric order, and pass the number to the subdir callback (this is
    used by prune-packed for its progress updates); as a side effect,
    this fixes the bugs Michael noticed with the subdir callback.

  - prune-packed now reuses the for_each_loose_file interface

  - use revs->ignore_missing_links so we don't barf on already-missing
    unreferenced objects

  - convert reachable.c to use traverse_commit_list instead of its own
    custom walk; this gives support for ignore_missing_links above, and
    saves us a fair bit of code.

  - while in the area, I noticed that reachable.c's reflog handling is
    the same as rev-list's --reflog option; it now builds on what's in
    revision.c.

That takes us up to patch 17. While working in reachable.c, I noticed an
oddity: we consider objects in the index to be reachable during prune
(which is good), but we do not when dropping them during a repack that
uses --unpack-unreachable=<expiration>. The remaining patches fix that,
which needed a fair bit of preparatory cleanup.

I'm really beginning to question whether the "just drop objects that are
about to be pruned" optimization done in 7e52f56 (gc: do not explode
objects which will be immediately pruned, 2012-04-07). It really
complicates things as pack-objects and prune need to have the exact same
rules (and implementing it naively, by having pack-objects run the same
code as prune, is not desirable because pack-objects has _already_ done
a complete expensive traversal to generate the packing list). And I fear
it will get even worse if we implement some of the race-condition fixes
that Michael suggested earlier.

On the other hand, the loosening behavior without 7e52f56 has some
severe pathological cases. A repository which has had a chunk of history
deleted can easily increase in size several orders of magnitude due to
loosening (since we lose the benefit of all deltas in the loosened
objects).

Finally, there are a few things that were discussed that I didn't
address/fix. I don't think any of them is a critical blocker, but I
did want to summarize the state:

  - when refreshing, we may update a pack's mtime multiple times. It
    probably wouldn't be too hard to cache this and only update once per
    program run, but I also don't think it's that big a deal in
    practice.

  - We will munge mtimes of objects found in alternates. If we don't
    have write access to the alternate, we'll create a local duplicate
    of the object. This is the safer thing, but I'm not sure if there
    are cases where we might try to write out a large number of objects
    which exist in an alternate (OTOH, we will eventually drop them at
    the next repack).

  - I didn't implement the "sort by inode" trick that fsck does when
    traversing the loose objects. It wouldn't be too hard, but I'm not
    convinced it's actually important.

  - I didn't convert fsck to the for_each_loose_file interface (mostly
    because I didn't do the inode-sorting trick, and while I don't think
    it matters, I didn't go to the work to show that it _doesn't_).

Here are the patches:

  [01/25]: foreach_alt_odb: propagate return value from callback
  [02/25]: isxdigit: cast input to unsigned char
  [03/25]: object_array: factor out slopbuf-freeing logic
  [04/25]: object_array: add a "clear" function
  [05/25]: clean up name allocation in prepare_revision_walk
  [06/25]: reachable: use traverse_commit_list instead of custom walk
  [07/25]: reachable: reuse revision.c "add all reflogs" code
  [08/25]: prune: factor out loose-object directory traversal
  [09/25]: reachable: mark index blobs as SEEN
  [10/25]: prune-packed: use for_each_loose_file_in_objdir
  [11/25]: count-objects: do not use xsize_t when counting object size
  [12/25]: count-objects: use for_each_loose_file_in_objdir
  [13/25]: sha1_file: add for_each iterators for loose and packed objects
  [14/25]: prune: keep objects reachable from recent objects
  [15/25]: pack-objects: refactor unpack-unreachable expiration check
  [16/25]: pack-objects: match prune logic for discarding objects
  [17/25]: write_sha1_file: freshen existing objects
  [18/25]: make add_object_array_with_context interface more sane
  [19/25]: traverse_commit_list: support pending blobs/trees with paths
  [20/25]: rev-list: document --reflog option
  [21/25]: rev-list: add --index-objects option
  [22/25]: reachable: use revision machinery's --index-objects code
  [23/25]: pack-objects: use argv_array
  [24/25]: repack: pack objects mentioned by the index
  [25/25]: pack-objects: double-check options before discarding objects

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-10-18 12:31 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-15 22:32 [PATCH v2 0/25] prune-safety Jeff King
2014-10-15 22:33 ` [PATCH v2 01/25] foreach_alt_odb: propagate return value from callback Jeff King
2014-10-15 22:34 ` [PATCH v2 02/25] isxdigit: cast input to unsigned char Jeff King
2014-10-16 17:16   ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 03/25] object_array: factor out slopbuf-freeing logic Jeff King
2014-10-16 17:39   ` Junio C Hamano
2014-10-17  0:33     ` git-bundle rev handling and de-duping Jeff King
2014-10-17 21:03       ` Philip Oakley
2014-10-17 22:41         ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 04/25] object_array: add a "clear" function Jeff King
2014-10-15 22:35 ` [PATCH v2 05/25] clean up name allocation in prepare_revision_walk Jeff King
2014-10-15 22:37 ` [PATCH v2 06/25] reachable: use traverse_commit_list instead of custom walk Jeff King
2014-10-16 17:53   ` Junio C Hamano
2014-10-15 22:38 ` [PATCH v2 07/25] reachable: reuse revision.c "add all reflogs" code Jeff King
2014-10-15 22:38 ` [PATCH v2 08/25] prune: factor out loose-object directory traversal Jeff King
2014-10-15 22:40 ` [PATCH v2 09/25] reachable: mark index blobs as SEEN Jeff King
2014-10-15 22:40 ` [PATCH v2 10/25] prune-packed: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:40 ` [PATCH v2 11/25] count-objects: do not use xsize_t when counting object size Jeff King
2014-10-15 22:41 ` [PATCH v2 12/25] count-objects: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:41 ` [PATCH v2 13/25] sha1_file: add for_each iterators for loose and packed objects Jeff King
2014-10-15 22:41 ` [PATCH v2 14/25] prune: keep objects reachable from recent objects Jeff King
2014-10-15 22:41 ` [PATCH v2 15/25] pack-objects: refactor unpack-unreachable expiration check Jeff King
2014-10-15 22:42 ` [PATCH v2 16/25] pack-objects: match prune logic for discarding objects Jeff King
2014-10-15 22:42 ` [PATCH v2 17/25] write_sha1_file: freshen existing objects Jeff King
2014-10-15 22:42 ` [PATCH v2 18/25] make add_object_array_with_context interface more sane Jeff King
2014-10-15 22:43 ` [PATCH v2 19/25] traverse_commit_list: support pending blobs/trees with paths Jeff King
2014-10-15 22:43 ` [PATCH v2 20/25] rev-list: document --reflog option Jeff King
2014-10-15 22:44 ` [PATCH v2 21/25] rev-list: add --index-objects option Jeff King
2014-10-16 18:41   ` Junio C Hamano
2014-10-17  0:12     ` Jeff King
2014-10-17  0:43       ` Jeff King
2014-10-17  0:44         ` [PATCH v3 22/26] rev-list: add --indexed-objects option Jeff King
2014-10-17  0:44         ` [PATCH v3 23/26] reachable: use revision machinery's --indexed-objects code Jeff King
2014-10-17  0:44         ` [PATCH v3 24/26] pack-objects: use argv_array Jeff King
2014-10-17  0:44         ` [PATCH v3 25/26] repack: pack objects mentioned by the index Jeff King
2014-10-17  0:44         ` [PATCH v3 26/26] pack-objects: double-check options before discarding objects Jeff King
2014-10-15 22:44 ` [PATCH v2 22/25] reachable: use revision machinery's --index-objects code Jeff King
2014-10-15 22:45 ` [PATCH v2 23/25] pack-objects: use argv_array Jeff King
2014-10-15 22:46 ` [PATCH v2 24/25] repack: pack objects mentioned by the index Jeff King
2014-10-15 22:48 ` [PATCH v2 25/25] pack-objects: double-check options before discarding objects Jeff King
2014-10-16 21:07 ` [PATCH v2 0/25] prune-safety Junio C Hamano
2014-10-16 21:10   ` Junio C Hamano
2014-10-16 21:21   ` Jeff King
2014-10-16 21:39     ` Jeff King
2014-10-16 22:18       ` Junio C Hamano
2014-10-17  0:03         ` Jeff King
     [not found]       ` <CAPc5daX0AFv9jDrFyd_OnupW5AfZW9Je_rgzaViX_xxs3SG5zg@mail.gmail.com>
2014-10-17  4:49         ` Jeff King
2014-10-18 12:31       ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).