git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Speed up repacking when lots of pack-kept objects
@ 2019-06-24 12:07 Nathaniel Filardo
  2019-06-24 12:07 ` [PATCH v3 1/5] count-objects: report statistics about kept packs Nathaniel Filardo
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Nathaniel Filardo @ 2019-06-24 12:07 UTC (permalink / raw)
  To: git; +Cc: stolee, Nathaniel Filardo

This patch series improves handling of very large repositories, as generated
by, for example, bup (https://github.com/bup/bup).  Prolonged operation
thereof creates quite a lot of small pack files; repacking improves
filesystem performance of the objects/pack directory, but is quite
expensive, in terms of time and memory.  We have adopted a strategy that
marks "large" (tens of GB) of pack files as "kept" and defers repacking
until there are enough un-kept packs or enough bytes of un-kept objects.
(The first patch in the series will make our accounting easier, replacing
some terrible shell scripting with grep.)

While this strategy has generally improved our lives relative to either
extreme (not repacking, or repacking after every bup save operation), it
still leaves a good bit to be desired.  Because our packs are marked as
kept, repacking will leave the objects therein alone, but it still must
instantiate in memory and walk the entire object graph.  However, because
our kept packs are transitively closed, such that an object in one
necessarily references only objects in other kept packs, we should like to
avoid reasoning about them more or less altogether.

This series attempts to do just that.  The fourth patch ("repack: optionally
assume transitive kept packs") adds an option to builtin/repack to enumerate
commits (and their trees) within kept packs as UNINTERESTING to its spawned
builtin/pack-objects command.  Together with inducing the use of sparse
reachability, this speeds enumerating candidate objects for repacking and
thereby substantially reduces the runtime of our repack operations, while
producing identical results.  Some test results are reported in that patch's
commit log.

This reroll incorporates feedback from Derrick Stolee, is rebased, and
is updated to pass "make test".

Nathaniel Filardo (5):
  count-objects: report statistics about kept packs
  revision walk: optionally use sparse reachability
  repack: add --sparse and pass to pack-objects
  repack: optionally assume transitive kept packs
  builtin/gc: add --assume-pack-keep-transitive

 Documentation/git-count-objects.txt |  8 ++++
 Documentation/git-gc.txt            |  4 ++
 Documentation/git-repack.txt        | 25 ++++++++++++
 bisect.c                            |  2 +-
 builtin/count-objects.c             | 17 +++++++-
 builtin/gc.c                        |  5 +++
 builtin/pack-objects.c              |  3 +-
 builtin/repack.c                    | 60 ++++++++++++++++++++++++++++-
 builtin/rev-list.c                  |  2 +-
 http-push.c                         |  2 +-
 list-objects.c                      |  5 +--
 list-objects.h                      |  3 +-
 revision.c                          |  3 +-
 revision.h                          |  1 +
 t/t5322-pack-objects-sparse.sh      |  6 +++
 t/t5500-fetch-pack.sh               |  8 ++--
 16 files changed, 137 insertions(+), 17 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-06-25 10:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-24 12:07 [PATCH v3 0/5] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
2019-06-24 12:07 ` [PATCH v3 1/5] count-objects: report statistics about kept packs Nathaniel Filardo
2019-06-24 12:52   ` Derrick Stolee
2019-06-24 12:07 ` [PATCH v3 2/5] revision walk: optionally use sparse reachability Nathaniel Filardo
2019-06-24 12:54   ` Derrick Stolee
2019-06-24 12:07 ` [PATCH v3 3/5] repack: add --sparse and pass to pack-objects Nathaniel Filardo
2019-06-24 13:03   ` Derrick Stolee
2019-06-24 12:07 ` [PATCH v3 4/5] repack: optionally assume transitive kept packs Nathaniel Filardo
2019-06-24 13:21   ` Derrick Stolee
2019-06-25 10:32     ` Dr N.W. Filardo
2019-06-24 12:07 ` [PATCH v3 5/5] builtin/gc: add --assume-pack-keep-transitive Nathaniel Filardo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).