* [PATCH v2 0/3] repack: implement `--cruft-max-size`
From: Taylor Blau @ 2023-10-03 0:44 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <cover.1694123506.git.me@ttaylorr.com>
(The earlier round of these patches depended on a couple of in-flight
topics, but this series has been rebased onto the tip of 'master').
This is a reroll of my series that introduces more ergonomic options for
creating and dealing with multiple cruft packs.
Much is unchanged since last time, but some notable changes that have
taken place include:
- adding a missing commit message / s-o-b to the first patch to split
cruft-related tests out of t7700 into t7704.
- renamed the option to `--max-cruft-size` (instead of
`--cruft-max-size`), and made corresponding changes to the
documentation, configuration options, tests, etc.
- cleaned up our handling of how we mark packs to be deleted and
retained
- parsed `--max-cruft-size` as an unsigned long up front instead of
blindly passing a char * down to pack-objects to parse it for us.
The last of those also affects `--max-pack-size`, which caused this
series to grow a new second patch which cleans up the existing code
which suffers from the same issue.
As usual, a range-diff is available below. Thanks for all of the helpful
review on the earlier round, and thanks in advance for your review on
this one!
Taylor Blau (3):
t7700: split cruft-related tests to t7704
builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
builtin/repack.c: implement support for `--max-cruft-size`
Documentation/config/gc.txt | 6 +
Documentation/git-gc.txt | 7 +
Documentation/git-repack.txt | 11 +
builtin/gc.c | 7 +
builtin/repack.c | 140 +++++++++++--
t/t6500-gc.sh | 27 +++
t/t7700-repack.sh | 121 -----------
t/t7704-repack-cruft.sh | 375 +++++++++++++++++++++++++++++++++++
8 files changed, 559 insertions(+), 135 deletions(-)
create mode 100755 t/t7704-repack-cruft.sh
Range-diff against v1:
1: 103e19c75a < -: ---------- builtin/repack.c: extract structure to store existing packs
2: 45d5f15308 < -: ---------- builtin/repack.c: extract marking packs for deletion
3: 30eccbfdcb < -: ---------- builtin/repack.c: extract redundant pack cleanup for --geometric
4: 57e322301d < -: ---------- builtin/repack.c: extract redundant pack cleanup for existing packs
5: 4df4a2e51e < -: ---------- builtin/repack.c: extract `has_existing_non_kept_packs()`
6: 360be582b4 < -: ---------- builtin/repack.c: store existing cruft packs separately
7: ef9c8d775b < -: ---------- builtin/repack.c: drop `DELETE_PACK` macro
8: 873bc16abb < -: ---------- builtin/repack.c: extract common cruft pack loop
9: de6c2a0d70 ! 1: 3ed4ab61f6 t7700: split cruft-related tests to t7704
@@ Metadata
## Commit message ##
t7700: split cruft-related tests to t7704
+ A small handful of the tests in t7700 (the main script for testing
+ functionality of 'git repack') are specifically related to cruft pack
+ operations.
+
+ Prepare for adding new cruft pack-related tests by moving the existing
+ set into a new test script.
+
+ Signed-off-by: Taylor Blau <me@ttaylorr.com>
+
## t/t7700-repack.sh ##
@@ t/t7700-repack.sh: test_expect_success '-n overrides repack.updateServerInfo=true' '
test_server_info_missing
-: ---------- > 2: 9ec999882d builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
10: 7e4e42e1aa ! 3: e7beb2060d builtin/repack.c: implement support for `--cruft-max-size`
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- builtin/repack.c: implement support for `--cruft-max-size`
+ builtin/repack.c: implement support for `--max-cruft-size`
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
@@ Commit message
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
- This all works, but can be costly from an I/O-perspective when a
- repository has either (a) many unreachable objects, (b) prunes objects
- relatively infrequently/never, or (c) both.
+ This all works, but can be costly from an I/O-perspective when
+ frequently repacking a repository that has many unreachable objects.
+ This problem is exacerbated when those unreachable objects are rarely
+ (if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
@@ Commit message
multiple cruft packs. This patch implements that support which we were
lacking.
- Introduce a new option `--cruft-max-size` which allows repositories to
+ Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
@@ Commit message
pack, along with any other unreachable objects which have since
entered the repository.
- This limits the I/O churn up to a quadratic function of the value
- specified by the `--cruft-max-size` option, instead of behaving
- quadratically in the number of total unreachable objects.
+ Once a cruft pack grows beyond the size specified via `--max-cruft-size`
+ the pack is effectively frozen. This limits the I/O churn up to a
+ quadratic function of the value specified by the `--max-cruft-size`
+ option, instead of behaving quadratically in the number of total
+ unreachable objects.
- When pruning unreachable objects, we bypass the new paths which combine
- small cruft packs together, and instead start from scratch, passing in
- the appropriate `--max-pack-size` down to `pack-objects`, putting it in
- charge of keeping the resulting set of cruft packs sized correctly.
+ When pruning unreachable objects, we bypass the new code paths which
+ combine small cruft packs together, and instead start from scratch,
+ passing in the appropriate `--max-pack-size` down to `pack-objects`,
+ putting it in charge of keeping the resulting set of cruft packs sized
+ correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
@@ Documentation/config/gc.txt: gc.cruftPacks::
linkgit:git-repack[1]) instead of as loose objects. The default
is `true`.
-+gc.cruftMaxSize::
++gc.maxCruftSize::
+ Limit the size of new cruft packs when repacking. When
-+ specified in addition to `--cruft-max-size`, the command line
-+ option takes priority. See the `--cruft-max-size` option of
++ specified in addition to `--max-cruft-size`, the command line
++ option takes priority. See the `--max-cruft-size` option of
+ linkgit:git-repack[1].
+
gc.pruneExpire::
@@ Documentation/git-gc.txt: be performed as well.
cruft pack instead of storing them as loose objects. `--cruft`
is on by default.
-+--cruft-max-size=<n>::
++--max-cruft-size=<n>::
+ When packing unreachable objects into a cruft pack, limit the
+ size of new cruft packs to be at most `<n>`. Overrides any
-+ value specified via the `gc.cruftMaxSize` configuration. See
-+ the `--cruft-max-size` option of linkgit:git-repack[1] for
++ value specified via the `gc.maxCruftSize` configuration. See
++ the `--max-cruft-size` option of linkgit:git-repack[1] for
+ more.
+
--prune=<date>::
@@ Documentation/git-repack.txt: to the new separate pack will be written.
immediately instead of waiting for the next `git gc` invocation.
Only useful with `--cruft -d`.
-+--cruft-max-size=<n>::
-+ Repack cruft objects into packs as large as `<n>` before
++--max-cruft-size=<n>::
++ Repack cruft objects into packs as large as `<n>` bytes before
+ creating new packs. As long as there are enough cruft packs
+ smaller than `<n>`, repacking will cause a new cruft pack to
+ be created containing objects from any combined cruft packs,
-+ along with any new unreachable objects. Cruft packs larger
-+ than `<n>` will not be modified. Only useful with `--cruft
-+ -d`.
++ along with any new unreachable objects. Cruft packs larger than
++ `<n>` will not be modified. When the new cruft pack is larger
++ than `<n>` bytes, it will be split into multiple packs, all of
++ which are guaranteed to be at most `<n>` bytes in size. Only
++ useful with `--cruft -d`.
+
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
@@ builtin/gc.c: static const char * const builtin_gc_usage[] = {
static int pack_refs = 1;
static int prune_reflogs = 1;
static int cruft_packs = 1;
-+static char *cruft_max_size;
++static unsigned long max_cruft_size;
static int aggressive_depth = 50;
static int aggressive_window = 250;
static int gc_auto_threshold = 6700;
@@ builtin/gc.c: static void gc_config(void)
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
git_config_get_bool("gc.autodetach", &detach_auto);
git_config_get_bool("gc.cruftpacks", &cruft_packs);
-+ git_config_get_string("gc.cruftmaxsize", &cruft_max_size);
++ git_config_get_ulong("gc.maxcruftsize", &max_cruft_size);
git_config_get_expiry("gc.pruneexpire", &prune_expire);
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
git_config_get_expiry("gc.logexpiry", &gc_log_expire);
@@ builtin/gc.c: static void add_repack_all_option(struct string_list *keep_pack)
strvec_push(&repack, "--cruft");
if (prune_expire)
strvec_pushf(&repack, "--cruft-expiration=%s", prune_expire);
-+ if (cruft_max_size)
-+ strvec_pushf(&repack, "--cruft-max-size=%s",
-+ cruft_max_size);
++ if (max_cruft_size)
++ strvec_pushf(&repack, "--max-cruft-size=%lu",
++ max_cruft_size);
} else {
strvec_push(&repack, "-A");
if (prune_expire)
@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
N_("prune unreferenced objects"),
PARSE_OPT_OPTARG, NULL, (intptr_t)prune_expire },
OPT_BOOL(0, "cruft", &cruft_packs, N_("pack unreferenced objects separately")),
-+ OPT_STRING(0, "cruft-max-size", &cruft_max_size,
-+ N_("bytes"),
-+ N_("with --cruft, limit the size of new cruft packs")),
++ OPT_MAGNITUDE(0, "max-cruft-size", &max_cruft_size,
++ N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")),
OPT_BOOL_F(0, "auto", &auto_gc, N_("enable auto-gc mode"),
PARSE_OPT_NOCOMPLETE),
## builtin/repack.c ##
@@
- #define LOOSEN_UNREACHABLE 2
#define PACK_CRUFT 4
-+#define DELETE_PACK ((void*)(uintptr_t)1)
-+#define RETAIN_PACK ((uintptr_t)(1<<1))
-+
+ #define DELETE_PACK 1
++#define RETAIN_PACK 2
+
static int pack_everything;
static int delta_base_offset = 1;
- static int pack_kept_objects = -1;
-@@ builtin/repack.c: static int has_existing_non_kept_packs(const struct existing_packs *existing)
- return existing->non_kept_packs.nr || existing->cruft_packs.nr;
+@@ builtin/repack.c: static void pack_mark_for_deletion(struct string_list_item *item)
+ item->util = (void*)((uintptr_t)item->util | DELETE_PACK);
}
++static void pack_unmark_for_deletion(struct string_list_item *item)
++{
++ item->util = (void*)((uintptr_t)item->util & ~DELETE_PACK);
++}
++
+ static int pack_is_marked_for_deletion(struct string_list_item *item)
+ {
+ return (uintptr_t)item->util & DELETE_PACK;
+ }
+
++static void pack_mark_retained(struct string_list_item *item)
++{
++ item->util = (void*)((uintptr_t)item->util | RETAIN_PACK);
++}
++
+static int pack_is_retained(struct string_list_item *item)
+{
+ return (uintptr_t)item->util & RETAIN_PACK;
@@ builtin/repack.c: static void mark_packs_for_deletion_1(struct string_list *name
- * (if `-d` was given).
- */
- if (!string_list_has_string(names, sha1))
-- item->util = (void*)1;
+
+ if (pack_is_retained(item)) {
-+ item->util = NULL;
++ pack_unmark_for_deletion(item);
+ } else if (!string_list_has_string(names, sha1)) {
+ /*
+ * Mark this pack for deletion, which ensures
@@ builtin/repack.c: static void mark_packs_for_deletion_1(struct string_list *name
+ * will actually delete this pack (if `-d` was
+ * given).
+ */
-+ item->util = DELETE_PACK;
+ pack_mark_for_deletion(item);
+ }
}
}
@@ builtin/repack.c: static void mark_packs_for_deletion_1(struct string_list *name
+ if (!item)
+ BUG("could not find cruft pack '%s'", pack_basename(cruft));
+
-+ item->util = (void*)RETAIN_PACK;
++ pack_mark_retained(item);
+ strbuf_release(&buf);
+}
+
@@ builtin/repack.c: static void remove_redundant_bitmaps(struct string_list *inclu
+ return 0;
+}
+
-+static void collapse_small_cruft_packs(FILE *in, unsigned long max_size,
++static void collapse_small_cruft_packs(FILE *in, size_t max_size,
+ struct existing_packs *existing)
+{
+ struct packed_git **existing_cruft, *p;
+ struct strbuf buf = STRBUF_INIT;
-+ unsigned long total_size = 0;
++ size_t total_size = 0;
+ size_t existing_cruft_nr = 0;
+ size_t i;
+
@@ builtin/repack.c: static void remove_redundant_bitmaps(struct string_list *inclu
+ QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
+
+ for (i = 0; i < existing_cruft_nr; i++) {
-+ off_t proposed;
++ size_t proposed;
+
+ p = existing_cruft[i];
+ proposed = st_add(total_size, p->pack_size);
@@ builtin/repack.c: static int write_cruft_pack(const struct pack_objects_args *ar
- for_each_string_list_item(item, &existing->cruft_packs)
- fprintf(in, "-%s.pack\n", item->string);
+ if (args->max_pack_size && !cruft_expiration) {
-+ unsigned long max_pack_size;
-+ if (!git_parse_ulong(args->max_pack_size, &max_pack_size))
-+ return error(_("could not parse --cruft-max-size: '%s'"),
-+ args->max_pack_size);
-+ collapse_small_cruft_packs(in, max_pack_size, existing);
++ collapse_small_cruft_packs(in, args->max_pack_size, existing);
+ } else {
+ for_each_string_list_item(item, &existing->non_kept_packs)
+ fprintf(in, "-%s.pack\n", item->string);
@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
PACK_CRUFT),
OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"),
N_("with --cruft, expire objects older than this")),
-+ OPT_STRING(0, "cruft-max-size", &cruft_po_args.max_pack_size,
-+ N_("bytes"),
++ OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size,
+ N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL('d', NULL, &delete_redundant,
N_("remove redundant packs, and run git-prune-packed")),
@@ t/t6500-gc.sh: test_expect_success 'gc.bigPackThreshold ignores cruft packs' '
+cruft_max_size_opts="git repack -d -l --cruft --cruft-expiration=2.weeks.ago"
+
-+test_expect_success 'setup for --cruft-max-size tests' '
++test_expect_success 'setup for --max-cruft-size tests' '
+ git init cruft--max-size &&
+ (
+ cd cruft--max-size &&
@@ t/t6500-gc.sh: test_expect_success 'gc.bigPackThreshold ignores cruft packs' '
+ )
+'
+
-+test_expect_success '--cruft-max-size sets appropriate repack options' '
++test_expect_success '--max-cruft-size sets appropriate repack options' '
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size \
-+ gc --cruft --cruft-max-size=1M &&
-+ test_subcommand $cruft_max_size_opts --cruft-max-size=1M <trace2.txt
++ gc --cruft --max-cruft-size=1M &&
++ test_subcommand $cruft_max_size_opts --max-cruft-size=1048576 <trace2.txt
+'
+
-+test_expect_success 'gc.cruftMaxSize sets appropriate repack options' '
++test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
-+ git -C cruft--max-size -c gc.cruftMaxSize=2M gc --cruft &&
-+ test_subcommand $cruft_max_size_opts --cruft-max-size=2M <trace2.txt &&
++ git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft &&
++ test_subcommand $cruft_max_size_opts --max-cruft-size=2097152 <trace2.txt &&
+
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
-+ git -C cruft--max-size -c gc.cruftMaxSize=2M gc --cruft \
-+ --cruft-max-size=3M &&
-+ test_subcommand $cruft_max_size_opts --cruft-max-size=3M <trace2.txt
++ git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft \
++ --max-cruft-size=3M &&
++ test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
+'
+
run_and_wait_for_auto_gc () {
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ echo "$packdir/pack-$pack.mtimes"
+}
+
-+test_expect_success '--cruft-max-size creates new packs when above threshold' '
-+ git init cruft-max-size-large &&
++test_expect_success '--max-cruft-size creates new packs when above threshold' '
++ git init max-cruft-size-large &&
+ (
-+ cd cruft-max-size-large &&
++ cd max-cruft-size-large &&
+ test_commit base &&
+
+ foo="$(pack_random_blob foo $((1*1024*1024)))" &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ cruft_foo="$(ls $packdir/pack-*.mtimes)" &&
+
+ bar="$(pack_random_blob bar $((1*1024*1024)))" &&
-+ git repack --cruft -d --cruft-max-size=1M &&
++ git repack --cruft -d --max-cruft-size=1M &&
+ cruft_bar="$(ls $packdir/pack-*.mtimes | grep -v $cruft_foo)" &&
+
+ test-tool pack-mtimes $(basename "$cruft_foo") >foo.objects &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size combines existing packs when below threshold' '
-+ git init cruft-max-size-small &&
++test_expect_success '--max-cruft-size combines existing packs when below threshold' '
++ git init max-cruft-size-small &&
+ (
-+ cd cruft-max-size-small &&
++ cd max-cruft-size-small &&
+ test_commit base &&
+
+ foo="$(pack_random_blob foo $((1*1024*1024)))" &&
+ git repack --cruft -d &&
+
+ bar="$(pack_random_blob bar $((1*1024*1024)))" &&
-+ git repack --cruft -d --cruft-max-size=10M &&
++ git repack --cruft -d --max-cruft-size=10M &&
+
+ cruft=$(ls $packdir/pack-*.mtimes) &&
+ test-tool pack-mtimes $(basename "$cruft") >cruft.objects &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size combines smaller packs first' '
-+ git init cruft-max-size-consume-small &&
++test_expect_success '--max-cruft-size combines smaller packs first' '
++ git init max-cruft-size-consume-small &&
+ (
-+ cd cruft-max-size-consume-small &&
++ cd max-cruft-size-consume-small &&
+
+ test_commit base &&
+ git repack -ad &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
+ sort expect.raw >expect.objects &&
+
-+ # repacking with `--cruft-max-size=2M` should combine
++ # repacking with `--max-cruft-size=2M` should combine
+ # both 0.5 MiB packs together, instead of, say, one of
+ # the 0.5 MiB packs with the 1.0 MiB pack
+ ls $packdir/pack-*.mtimes | sort >cruft.before &&
-+ git repack -d --cruft --cruft-max-size=2M &&
++ git repack -d --cruft --max-cruft-size=2M &&
+ ls $packdir/pack-*.mtimes | sort >cruft.after &&
+
+ comm -13 cruft.before cruft.after >cruft.new &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success 'setup --cruft-max-size with freshened objects' '
-+ git init cruft-max-size-freshen &&
++test_expect_success 'setup --max-cruft-size with freshened objects' '
++ git init max-cruft-size-freshen &&
+ (
-+ cd cruft-max-size-freshen &&
++ cd max-cruft-size-freshen &&
+
+ test_commit base &&
+ git repack -ad &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size with freshened objects (loose)' '
++test_expect_success '--max-cruft-size with freshened objects (loose)' '
+ (
-+ cd cruft-max-size-freshen &&
++ cd max-cruft-size-freshen &&
+
+ # regenerate the object, setting its mtime to be more recent
+ foo="$(generate_random_blob foo 64)" &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size with freshened objects (packed)' '
++test_expect_success '--max-cruft-size with freshened objects (packed)' '
+ (
-+ cd cruft-max-size-freshen &&
++ cd max-cruft-size-freshen &&
+
+ # regenerate the object and store it in a packfile,
+ # setting its mtime to be more recent
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size with pruning' '
-+ git init cruft-max-size-prune &&
++test_expect_success '--max-cruft-size with pruning' '
++ git init max-cruft-size-prune &&
+ (
-+ cd cruft-max-size-prune &&
++ cd max-cruft-size-prune &&
+
+ test_commit base &&
+ foo="$(generate_random_blob foo $((1024*1024)))" &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+
+ test-tool chmtime -10000 "$objdir/$(test_oid_to_path "$foo")" &&
+
-+ git repack -d --cruft --cruft-max-size=1M &&
++ git repack -d --cruft --max-cruft-size=1M &&
+
+ # backdate the mtimes of all cruft packs to validate
+ # that they were rewritten as a result of pruning
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ echo $cruft $mtime >>mtimes || return 1
+ done &&
+
-+ # repack (and prune) with a --cruft-max-size to ensure
++ # repack (and prune) with a --max-cruft-size to ensure
+ # that we appropriately split the resulting set of packs
-+ git repack -d --cruft --cruft-max-size=1M \
++ git repack -d --cruft --max-cruft-size=1M \
+ --cruft-expiration=10.seconds.ago &&
+ ls $packdir/pack-*.mtimes | sort >cruft.after &&
+
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ )
+'
+
-+test_expect_success '--cruft-max-size ignores non-local packs' '
-+ repo="cruft-max-size-non-local" &&
++test_expect_success '--max-cruft-size ignores non-local packs' '
++ repo="max-cruft-size-non-local" &&
+ git init $repo &&
+ (
+ cd $repo &&
@@ t/t7704-repack-cruft.sh: test_expect_success '--expire-to stores pruned objects
+ # ensure that we do not attempt to pick up packs from
+ # the non-alternated repository, which would result in a
+ # crash
-+ git repack --cruft --cruft-max-size=1M -d
++ git repack --cruft --max-cruft-size=1M -d
+ )
+'
+
--
2.42.0.310.g9604b54f73.dirty
^ permalink raw reply
* [PATCH v2 1/3] t7700: split cruft-related tests to t7704
From: Taylor Blau @ 2023-10-03 0:44 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <cover.1696293862.git.me@ttaylorr.com>
A small handful of the tests in t7700 (the main script for testing
functionality of 'git repack') are specifically related to cruft pack
operations.
Prepare for adding new cruft pack-related tests by moving the existing
set into a new test script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7700-repack.sh | 121 -------------------------------------
t/t7704-repack-cruft.sh | 130 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 130 insertions(+), 121 deletions(-)
create mode 100755 t/t7704-repack-cruft.sh
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 27b66807cd..55ee3eb8ae 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -633,125 +633,4 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
test_server_info_missing
'
-test_expect_success '--expire-to stores pruned objects (now)' '
- git init expire-to-now &&
- (
- cd expire-to-now &&
-
- git branch -M main &&
-
- test_commit base &&
-
- git checkout -b cruft &&
- test_commit --no-tag cruft &&
-
- git rev-list --objects --no-object-names main..cruft >moved.raw &&
- sort moved.raw >moved.want &&
-
- git rev-list --all --objects --no-object-names >expect.raw &&
- sort expect.raw >expect &&
-
- git checkout main &&
- git branch -D cruft &&
- git reflog expire --all --expire=all &&
-
- git init --bare expired.git &&
- git repack -d \
- --cruft --cruft-expiration="now" \
- --expire-to="expired.git/objects/pack/pack" &&
-
- expired="$(ls expired.git/objects/pack/pack-*.idx)" &&
- test_path_is_file "${expired%.idx}.mtimes" &&
-
- # Since the `--cruft-expiration` is "now", the effective
- # behavior is to move _all_ unreachable objects out to
- # the location in `--expire-to`.
- git show-index <$expired >expired.raw &&
- cut -d" " -f2 expired.raw | sort >expired.objects &&
- git rev-list --all --objects --no-object-names \
- >remaining.objects &&
-
- # ...in other words, the combined contents of this
- # repository and expired.git should be the same as the
- # set of objects we started with.
- cat expired.objects remaining.objects | sort >actual &&
- test_cmp expect actual &&
-
- # The "moved" objects (i.e., those in expired.git)
- # should be the same as the cruft objects which were
- # expired in the previous step.
- test_cmp moved.want expired.objects
- )
-'
-
-test_expect_success '--expire-to stores pruned objects (5.minutes.ago)' '
- git init expire-to-5.minutes.ago &&
- (
- cd expire-to-5.minutes.ago &&
-
- git branch -M main &&
-
- test_commit base &&
-
- # Create two classes of unreachable objects, one which
- # is older than 5 minutes (stale), and another which is
- # newer (recent).
- for kind in stale recent
- do
- git checkout -b $kind main &&
- test_commit --no-tag $kind || return 1
- done &&
-
- git rev-list --objects --no-object-names main..stale >in &&
- stale="$(git pack-objects $objdir/pack/pack <in)" &&
- mtime="$(test-tool chmtime --get =-600 $objdir/pack/pack-$stale.pack)" &&
-
- # expect holds the set of objects we expect to find in
- # this repository after repacking
- git rev-list --objects --no-object-names recent >expect.raw &&
- sort expect.raw >expect &&
-
- # moved.want holds the set of objects we expect to find
- # in expired.git
- git rev-list --objects --no-object-names main..stale >out &&
- sort out >moved.want &&
-
- git checkout main &&
- git branch -D stale recent &&
- git reflog expire --all --expire=all &&
- git prune-packed &&
-
- git init --bare expired.git &&
- git repack -d \
- --cruft --cruft-expiration=5.minutes.ago \
- --expire-to="expired.git/objects/pack/pack" &&
-
- # Some of the remaining objects in this repository are
- # unreachable, so use `cat-file --batch-all-objects`
- # instead of `rev-list` to get their names
- git cat-file --batch-all-objects --batch-check="%(objectname)" \
- >remaining.objects &&
- sort remaining.objects >actual &&
- test_cmp expect actual &&
-
- (
- cd expired.git &&
-
- expired="$(ls objects/pack/pack-*.mtimes)" &&
- test-tool pack-mtimes $(basename $expired) >out &&
- cut -d" " -f1 out | sort >../moved.got &&
-
- # Ensure that there are as many objects with the
- # expected mtime as were moved to expired.git.
- #
- # In other words, ensure that the recorded
- # mtimes of any moved objects was written
- # correctly.
- grep " $mtime$" out >matching &&
- test_line_count = $(wc -l <../moved.want) matching
- ) &&
- test_cmp moved.want moved.got
- )
-'
-
test_done
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
new file mode 100755
index 0000000000..d91fcf1af1
--- /dev/null
+++ b/t/t7704-repack-cruft.sh
@@ -0,0 +1,130 @@
+#!/bin/sh
+
+test_description='git repack works correctly'
+
+. ./test-lib.sh
+
+objdir=.git/objects
+
+test_expect_success '--expire-to stores pruned objects (now)' '
+ git init expire-to-now &&
+ (
+ cd expire-to-now &&
+
+ git branch -M main &&
+
+ test_commit base &&
+
+ git checkout -b cruft &&
+ test_commit --no-tag cruft &&
+
+ git rev-list --objects --no-object-names main..cruft >moved.raw &&
+ sort moved.raw >moved.want &&
+
+ git rev-list --all --objects --no-object-names >expect.raw &&
+ sort expect.raw >expect &&
+
+ git checkout main &&
+ git branch -D cruft &&
+ git reflog expire --all --expire=all &&
+
+ git init --bare expired.git &&
+ git repack -d \
+ --cruft --cruft-expiration="now" \
+ --expire-to="expired.git/objects/pack/pack" &&
+
+ expired="$(ls expired.git/objects/pack/pack-*.idx)" &&
+ test_path_is_file "${expired%.idx}.mtimes" &&
+
+ # Since the `--cruft-expiration` is "now", the effective
+ # behavior is to move _all_ unreachable objects out to
+ # the location in `--expire-to`.
+ git show-index <$expired >expired.raw &&
+ cut -d" " -f2 expired.raw | sort >expired.objects &&
+ git rev-list --all --objects --no-object-names \
+ >remaining.objects &&
+
+ # ...in other words, the combined contents of this
+ # repository and expired.git should be the same as the
+ # set of objects we started with.
+ cat expired.objects remaining.objects | sort >actual &&
+ test_cmp expect actual &&
+
+ # The "moved" objects (i.e., those in expired.git)
+ # should be the same as the cruft objects which were
+ # expired in the previous step.
+ test_cmp moved.want expired.objects
+ )
+'
+
+test_expect_success '--expire-to stores pruned objects (5.minutes.ago)' '
+ git init expire-to-5.minutes.ago &&
+ (
+ cd expire-to-5.minutes.ago &&
+
+ git branch -M main &&
+
+ test_commit base &&
+
+ # Create two classes of unreachable objects, one which
+ # is older than 5 minutes (stale), and another which is
+ # newer (recent).
+ for kind in stale recent
+ do
+ git checkout -b $kind main &&
+ test_commit --no-tag $kind || return 1
+ done &&
+
+ git rev-list --objects --no-object-names main..stale >in &&
+ stale="$(git pack-objects $objdir/pack/pack <in)" &&
+ mtime="$(test-tool chmtime --get =-600 $objdir/pack/pack-$stale.pack)" &&
+
+ # expect holds the set of objects we expect to find in
+ # this repository after repacking
+ git rev-list --objects --no-object-names recent >expect.raw &&
+ sort expect.raw >expect &&
+
+ # moved.want holds the set of objects we expect to find
+ # in expired.git
+ git rev-list --objects --no-object-names main..stale >out &&
+ sort out >moved.want &&
+
+ git checkout main &&
+ git branch -D stale recent &&
+ git reflog expire --all --expire=all &&
+ git prune-packed &&
+
+ git init --bare expired.git &&
+ git repack -d \
+ --cruft --cruft-expiration=5.minutes.ago \
+ --expire-to="expired.git/objects/pack/pack" &&
+
+ # Some of the remaining objects in this repository are
+ # unreachable, so use `cat-file --batch-all-objects`
+ # instead of `rev-list` to get their names
+ git cat-file --batch-all-objects --batch-check="%(objectname)" \
+ >remaining.objects &&
+ sort remaining.objects >actual &&
+ test_cmp expect actual &&
+
+ (
+ cd expired.git &&
+
+ expired="$(ls objects/pack/pack-*.mtimes)" &&
+ test-tool pack-mtimes $(basename $expired) >out &&
+ cut -d" " -f1 out | sort >../moved.got &&
+
+ # Ensure that there are as many objects with the
+ # expected mtime as were moved to expired.git.
+ #
+ # In other words, ensure that the recorded
+ # mtimes of any moved objects was written
+ # correctly.
+ grep " $mtime$" out >matching &&
+ test_line_count = $(wc -l <../moved.want) matching
+ ) &&
+ test_cmp moved.want moved.got
+ )
+'
+
+test_done
--
2.42.0.310.g9604b54f73.dirty
^ permalink raw reply related
* [PATCH v2 2/3] builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
From: Taylor Blau @ 2023-10-03 0:44 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <cover.1696293862.git.me@ttaylorr.com>
The repack builtin takes a `--max-pack-size` command-line argument which
it uses to feed into any of the pack-objects children that it may spawn
when generating a new pack.
This option is parsed with OPT_STRING, meaning that we'll accept
anything as input, punting on more fine-grained validation until we get
down into pack-objects.
This is fine, but it's wasteful to spend an entire sub-process just to
figure out that one of its option is bogus. Instead, parse the value of
`--max-pack-size` with OPT_MAGNITUDE in 'git repack', and then pass the
knonw-good result down to pack-objects.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 529e13120d..8a5bbb9cba 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -51,7 +51,7 @@ struct pack_objects_args {
const char *window_memory;
const char *depth;
const char *threads;
- const char *max_pack_size;
+ unsigned long max_pack_size;
int no_reuse_delta;
int no_reuse_object;
int quiet;
@@ -242,7 +242,7 @@ static void prepare_pack_objects(struct child_process *cmd,
if (args->threads)
strvec_pushf(&cmd->args, "--threads=%s", args->threads);
if (args->max_pack_size)
- strvec_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size);
+ strvec_pushf(&cmd->args, "--max-pack-size=%lu", args->max_pack_size);
if (args->no_reuse_delta)
strvec_pushf(&cmd->args, "--no-reuse-delta");
if (args->no_reuse_object)
@@ -946,7 +946,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("limits the maximum delta depth")),
OPT_STRING(0, "threads", &po_args.threads, N_("n"),
N_("limits the maximum number of threads")),
- OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
+ OPT_MAGNITUDE(0, "max-pack-size", &po_args.max_pack_size,
N_("maximum size of each packfile")),
OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
N_("repack objects in packs marked with .keep")),
--
2.42.0.310.g9604b54f73.dirty
^ permalink raw reply related
* [PATCH v2 3/3] builtin/repack.c: implement support for `--max-cruft-size`
From: Taylor Blau @ 2023-10-03 0:44 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <cover.1696293862.git.me@ttaylorr.com>
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.
When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:
- Repacking periodically, either by packing anything loose in the
repository (via `git repack -d`) or producing a geometric sequence
of packs (via `git repack --geometric=<d> -d`).
- Every so often, splitting the repository into two packs, one cruft
to store the unreachable objects, and another non-cruft pack to
store the reachable objects.
Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.
This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).
At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.
Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:
- Sort a list of any existing cruft packs in ascending order of pack
size.
- Starting from the beginning of the list, group cruft packs together
while the accumulated size is smaller than the maximum specified
pack size.
- Combine the objects in these cruft packs together into a new cruft
pack, along with any other unreachable objects which have since
entered the repository.
Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.
When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/config/gc.txt | 6 +
Documentation/git-gc.txt | 7 +
Documentation/git-repack.txt | 11 ++
builtin/gc.c | 7 +
builtin/repack.c | 134 +++++++++++++++++--
t/t6500-gc.sh | 27 ++++
t/t7704-repack-cruft.sh | 245 +++++++++++++++++++++++++++++++++++
7 files changed, 426 insertions(+), 11 deletions(-)
diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index ca47eb2008..83ebc2de55 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -86,6 +86,12 @@ gc.cruftPacks::
linkgit:git-repack[1]) instead of as loose objects. The default
is `true`.
+gc.maxCruftSize::
+ Limit the size of new cruft packs when repacking. When
+ specified in addition to `--max-cruft-size`, the command line
+ option takes priority. See the `--max-cruft-size` option of
+ linkgit:git-repack[1].
+
gc.pruneExpire::
When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'
(and 'repack --cruft --cruft-expiration 2.weeks.ago' if using
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 90806fd26a..fa0541b416 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,6 +59,13 @@ be performed as well.
cruft pack instead of storing them as loose objects. `--cruft`
is on by default.
+--max-cruft-size=<n>::
+ When packing unreachable objects into a cruft pack, limit the
+ size of new cruft packs to be at most `<n>`. Overrides any
+ value specified via the `gc.maxCruftSize` configuration. See
+ the `--max-cruft-size` option of linkgit:git-repack[1] for
+ more.
+
--prune=<date>::
Prune loose objects older than date (default is 2 weeks ago,
overridable by the config variable `gc.pruneExpire`).
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 4017157949..fbfc72e1b2 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -74,6 +74,17 @@ to the new separate pack will be written.
immediately instead of waiting for the next `git gc` invocation.
Only useful with `--cruft -d`.
+--max-cruft-size=<n>::
+ Repack cruft objects into packs as large as `<n>` bytes before
+ creating new packs. As long as there are enough cruft packs
+ smaller than `<n>`, repacking will cause a new cruft pack to
+ be created containing objects from any combined cruft packs,
+ along with any new unreachable objects. Cruft packs larger than
+ `<n>` will not be modified. When the new cruft pack is larger
+ than `<n>` bytes, it will be split into multiple packs, all of
+ which are guaranteed to be at most `<n>` bytes in size. Only
+ useful with `--cruft -d`.
+
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
directory `<dir>`. This option is useful for keeping a copy of
diff --git a/builtin/gc.c b/builtin/gc.c
index 00192ae5d3..c97c9fb046 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -52,6 +52,7 @@ static const char * const builtin_gc_usage[] = {
static int pack_refs = 1;
static int prune_reflogs = 1;
static int cruft_packs = 1;
+static unsigned long max_cruft_size;
static int aggressive_depth = 50;
static int aggressive_window = 250;
static int gc_auto_threshold = 6700;
@@ -163,6 +164,7 @@ static void gc_config(void)
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
git_config_get_bool("gc.autodetach", &detach_auto);
git_config_get_bool("gc.cruftpacks", &cruft_packs);
+ git_config_get_ulong("gc.maxcruftsize", &max_cruft_size);
git_config_get_expiry("gc.pruneexpire", &prune_expire);
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
git_config_get_expiry("gc.logexpiry", &gc_log_expire);
@@ -347,6 +349,9 @@ static void add_repack_all_option(struct string_list *keep_pack)
strvec_push(&repack, "--cruft");
if (prune_expire)
strvec_pushf(&repack, "--cruft-expiration=%s", prune_expire);
+ if (max_cruft_size)
+ strvec_pushf(&repack, "--max-cruft-size=%lu",
+ max_cruft_size);
} else {
strvec_push(&repack, "-A");
if (prune_expire)
@@ -575,6 +580,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
N_("prune unreferenced objects"),
PARSE_OPT_OPTARG, NULL, (intptr_t)prune_expire },
OPT_BOOL(0, "cruft", &cruft_packs, N_("pack unreferenced objects separately")),
+ OPT_MAGNITUDE(0, "max-cruft-size", &max_cruft_size,
+ N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")),
OPT_BOOL_F(0, "auto", &auto_gc, N_("enable auto-gc mode"),
PARSE_OPT_NOCOMPLETE),
diff --git a/builtin/repack.c b/builtin/repack.c
index 8a5bbb9cba..04770b15fe 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -27,6 +27,7 @@
#define PACK_CRUFT 4
#define DELETE_PACK 1
+#define RETAIN_PACK 2
static int pack_everything;
static int delta_base_offset = 1;
@@ -116,11 +117,26 @@ static void pack_mark_for_deletion(struct string_list_item *item)
item->util = (void*)((uintptr_t)item->util | DELETE_PACK);
}
+static void pack_unmark_for_deletion(struct string_list_item *item)
+{
+ item->util = (void*)((uintptr_t)item->util & ~DELETE_PACK);
+}
+
static int pack_is_marked_for_deletion(struct string_list_item *item)
{
return (uintptr_t)item->util & DELETE_PACK;
}
+static void pack_mark_retained(struct string_list_item *item)
+{
+ item->util = (void*)((uintptr_t)item->util | RETAIN_PACK);
+}
+
+static int pack_is_retained(struct string_list_item *item)
+{
+ return (uintptr_t)item->util & RETAIN_PACK;
+}
+
static void mark_packs_for_deletion_1(struct string_list *names,
struct string_list *list)
{
@@ -133,17 +149,39 @@ static void mark_packs_for_deletion_1(struct string_list *names,
if (len < hexsz)
continue;
sha1 = item->string + len - hexsz;
- /*
- * Mark this pack for deletion, which ensures that this
- * pack won't be included in a MIDX (if `--write-midx`
- * was given) and that we will actually delete this pack
- * (if `-d` was given).
- */
- if (!string_list_has_string(names, sha1))
+
+ if (pack_is_retained(item)) {
+ pack_unmark_for_deletion(item);
+ } else if (!string_list_has_string(names, sha1)) {
+ /*
+ * Mark this pack for deletion, which ensures
+ * that this pack won't be included in a MIDX
+ * (if `--write-midx` was given) and that we
+ * will actually delete this pack (if `-d` was
+ * given).
+ */
pack_mark_for_deletion(item);
+ }
}
}
+static void retain_cruft_pack(struct existing_packs *existing,
+ struct packed_git *cruft)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+
+ strbuf_addstr(&buf, pack_basename(cruft));
+ strbuf_strip_suffix(&buf, ".pack");
+
+ item = string_list_lookup(&existing->cruft_packs, buf.buf);
+ if (!item)
+ BUG("could not find cruft pack '%s'", pack_basename(cruft));
+
+ pack_mark_retained(item);
+ strbuf_release(&buf);
+}
+
static void mark_packs_for_deletion(struct existing_packs *existing,
struct string_list *names)
@@ -225,6 +263,8 @@ static void collect_pack_filenames(struct existing_packs *existing,
}
string_list_sort(&existing->kept_packs);
+ string_list_sort(&existing->non_kept_packs);
+ string_list_sort(&existing->cruft_packs);
strbuf_release(&buf);
}
@@ -806,6 +846,72 @@ static void remove_redundant_bitmaps(struct string_list *include,
strbuf_release(&path);
}
+static int existing_cruft_pack_cmp(const void *va, const void *vb)
+{
+ struct packed_git *a = *(struct packed_git **)va;
+ struct packed_git *b = *(struct packed_git **)vb;
+
+ if (a->pack_size < b->pack_size)
+ return -1;
+ if (a->pack_size > b->pack_size)
+ return 1;
+ return 0;
+}
+
+static void collapse_small_cruft_packs(FILE *in, size_t max_size,
+ struct existing_packs *existing)
+{
+ struct packed_git **existing_cruft, *p;
+ struct strbuf buf = STRBUF_INIT;
+ size_t total_size = 0;
+ size_t existing_cruft_nr = 0;
+ size_t i;
+
+ ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
+
+ for (p = get_all_packs(the_repository); p; p = p->next) {
+ if (!(p->is_cruft && p->pack_local))
+ continue;
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, pack_basename(p));
+ strbuf_strip_suffix(&buf, ".pack");
+
+ if (!string_list_has_string(&existing->cruft_packs, buf.buf))
+ continue;
+
+ if (existing_cruft_nr >= existing->cruft_packs.nr)
+ BUG("too many cruft packs (found %"PRIuMAX", but knew "
+ "of %"PRIuMAX")",
+ (uintmax_t)existing_cruft_nr + 1,
+ (uintmax_t)existing->cruft_packs.nr);
+ existing_cruft[existing_cruft_nr++] = p;
+ }
+
+ QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
+
+ for (i = 0; i < existing_cruft_nr; i++) {
+ size_t proposed;
+
+ p = existing_cruft[i];
+ proposed = st_add(total_size, p->pack_size);
+
+ if (proposed <= max_size) {
+ total_size = proposed;
+ fprintf(in, "-%s\n", pack_basename(p));
+ } else {
+ retain_cruft_pack(existing, p);
+ fprintf(in, "%s\n", pack_basename(p));
+ }
+ }
+
+ for (i = 0; i < existing->non_kept_packs.nr; i++)
+ fprintf(in, "-%s.pack\n",
+ existing->non_kept_packs.items[i].string);
+
+ strbuf_release(&buf);
+}
+
static int write_cruft_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
@@ -853,10 +959,14 @@ static int write_cruft_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
- for_each_string_list_item(item, &existing->non_kept_packs)
- fprintf(in, "-%s.pack\n", item->string);
- for_each_string_list_item(item, &existing->cruft_packs)
- fprintf(in, "-%s.pack\n", item->string);
+ if (args->max_pack_size && !cruft_expiration) {
+ collapse_small_cruft_packs(in, args->max_pack_size, existing);
+ } else {
+ for_each_string_list_item(item, &existing->non_kept_packs)
+ fprintf(in, "-%s.pack\n", item->string);
+ for_each_string_list_item(item, &existing->cruft_packs)
+ fprintf(in, "-%s.pack\n", item->string);
+ }
for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s.pack\n", item->string);
fclose(in);
@@ -919,6 +1029,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
PACK_CRUFT),
OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"),
N_("with --cruft, expire objects older than this")),
+ OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size,
+ N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL('d', NULL, &delete_redundant,
N_("remove redundant packs, and run git-prune-packed")),
OPT_BOOL('f', NULL, &po_args.no_reuse_delta,
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 69509d0c11..0d6b5c3b27 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -303,6 +303,33 @@ test_expect_success 'gc.bigPackThreshold ignores cruft packs' '
)
'
+cruft_max_size_opts="git repack -d -l --cruft --cruft-expiration=2.weeks.ago"
+
+test_expect_success 'setup for --max-cruft-size tests' '
+ git init cruft--max-size &&
+ (
+ cd cruft--max-size &&
+ prepare_cruft_history
+ )
+'
+
+test_expect_success '--max-cruft-size sets appropriate repack options' '
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size \
+ gc --cruft --max-cruft-size=1M &&
+ test_subcommand $cruft_max_size_opts --max-cruft-size=1048576 <trace2.txt
+'
+
+test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
+ git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft &&
+ test_subcommand $cruft_max_size_opts --max-cruft-size=2097152 <trace2.txt &&
+
+ GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
+ git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft \
+ --max-cruft-size=3M &&
+ test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
+'
+
run_and_wait_for_auto_gc () {
# We read stdout from gc for the side effect of waiting until the
# background gc process exits, closing its fd 9. Furthermore, the
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index d91fcf1af1..dc86ca8269 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -5,6 +5,7 @@ test_description='git repack works correctly'
. ./test-lib.sh
objdir=.git/objects
+packdir=$objdir/pack
test_expect_success '--expire-to stores pruned objects (now)' '
git init expire-to-now &&
@@ -127,4 +128,248 @@ test_expect_success '--expire-to stores pruned objects (5.minutes.ago)' '
)
'
+generate_random_blob() {
+ test-tool genrandom "$@" >blob &&
+ git hash-object -w -t blob blob &&
+ rm blob
+}
+
+pack_random_blob () {
+ generate_random_blob "$@" &&
+ git repack -d -q >/dev/null
+}
+
+generate_cruft_pack () {
+ pack_random_blob "$@" >/dev/null &&
+
+ ls $packdir/pack-*.pack | xargs -n 1 basename >in &&
+ pack="$(git pack-objects --cruft $packdir/pack <in)" &&
+ git prune-packed &&
+
+ echo "$packdir/pack-$pack.mtimes"
+}
+
+test_expect_success '--max-cruft-size creates new packs when above threshold' '
+ git init max-cruft-size-large &&
+ (
+ cd max-cruft-size-large &&
+ test_commit base &&
+
+ foo="$(pack_random_blob foo $((1*1024*1024)))" &&
+ git repack --cruft -d &&
+ cruft_foo="$(ls $packdir/pack-*.mtimes)" &&
+
+ bar="$(pack_random_blob bar $((1*1024*1024)))" &&
+ git repack --cruft -d --max-cruft-size=1M &&
+ cruft_bar="$(ls $packdir/pack-*.mtimes | grep -v $cruft_foo)" &&
+
+ test-tool pack-mtimes $(basename "$cruft_foo") >foo.objects &&
+ test-tool pack-mtimes $(basename "$cruft_bar") >bar.objects &&
+
+ grep "^$foo" foo.objects &&
+ test_line_count = 1 foo.objects &&
+ grep "^$bar" bar.objects &&
+ test_line_count = 1 bar.objects
+ )
+'
+
+test_expect_success '--max-cruft-size combines existing packs when below threshold' '
+ git init max-cruft-size-small &&
+ (
+ cd max-cruft-size-small &&
+ test_commit base &&
+
+ foo="$(pack_random_blob foo $((1*1024*1024)))" &&
+ git repack --cruft -d &&
+
+ bar="$(pack_random_blob bar $((1*1024*1024)))" &&
+ git repack --cruft -d --max-cruft-size=10M &&
+
+ cruft=$(ls $packdir/pack-*.mtimes) &&
+ test-tool pack-mtimes $(basename "$cruft") >cruft.objects &&
+
+ grep "^$foo" cruft.objects &&
+ grep "^$bar" cruft.objects &&
+ test_line_count = 2 cruft.objects
+ )
+'
+
+test_expect_success '--max-cruft-size combines smaller packs first' '
+ git init max-cruft-size-consume-small &&
+ (
+ cd max-cruft-size-consume-small &&
+
+ test_commit base &&
+ git repack -ad &&
+
+ cruft_foo="$(generate_cruft_pack foo 524288)" && # 0.5 MiB
+ cruft_bar="$(generate_cruft_pack bar 524288)" && # 0.5 MiB
+ cruft_baz="$(generate_cruft_pack baz 1048576)" && # 1.0 MiB
+ cruft_quux="$(generate_cruft_pack quux 1572864)" && # 1.5 MiB
+
+ test-tool pack-mtimes "$(basename $cruft_foo)" >expect.raw &&
+ test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
+ sort expect.raw >expect.objects &&
+
+ # repacking with `--max-cruft-size=2M` should combine
+ # both 0.5 MiB packs together, instead of, say, one of
+ # the 0.5 MiB packs with the 1.0 MiB pack
+ ls $packdir/pack-*.mtimes | sort >cruft.before &&
+ git repack -d --cruft --max-cruft-size=2M &&
+ ls $packdir/pack-*.mtimes | sort >cruft.after &&
+
+ comm -13 cruft.before cruft.after >cruft.new &&
+ comm -23 cruft.before cruft.after >cruft.removed &&
+
+ test_line_count = 1 cruft.new &&
+ test_line_count = 2 cruft.removed &&
+
+ # the two smaller packs should be rolled up first
+ printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed &&
+ test_cmp expect.removed cruft.removed &&
+
+ # ...and contain the set of objects rolled up
+ test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw &&
+ sort actual.raw >actual.objects &&
+
+ test_cmp expect.objects actual.objects
+ )
+'
+
+test_expect_success 'setup --max-cruft-size with freshened objects' '
+ git init max-cruft-size-freshen &&
+ (
+ cd max-cruft-size-freshen &&
+
+ test_commit base &&
+ git repack -ad &&
+
+ foo="$(generate_random_blob foo 64)" &&
+ test-tool chmtime --get -10000 \
+ "$objdir/$(test_oid_to_path "$foo")" >foo.mtime &&
+
+ git repack --cruft -d &&
+
+ cruft="$(ls $packdir/pack-*.mtimes)" &&
+ test-tool pack-mtimes "$(basename $cruft)" >actual &&
+ echo "$foo $(cat foo.mtime)" >expect &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--max-cruft-size with freshened objects (loose)' '
+ (
+ cd max-cruft-size-freshen &&
+
+ # regenerate the object, setting its mtime to be more recent
+ foo="$(generate_random_blob foo 64)" &&
+ test-tool chmtime --get -100 \
+ "$objdir/$(test_oid_to_path "$foo")" >foo.mtime &&
+
+ git repack --cruft -d &&
+
+ cruft="$(ls $packdir/pack-*.mtimes)" &&
+ test-tool pack-mtimes "$(basename $cruft)" >actual &&
+ echo "$foo $(cat foo.mtime)" >expect &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--max-cruft-size with freshened objects (packed)' '
+ (
+ cd max-cruft-size-freshen &&
+
+ # regenerate the object and store it in a packfile,
+ # setting its mtime to be more recent
+ #
+ # store it alongside another cruft object so that we
+ # do not create an identical copy of the existing
+ # cruft pack (which contains $foo).
+ foo="$(generate_random_blob foo 64)" &&
+ bar="$(generate_random_blob bar 64)" &&
+ foo_pack="$(printf "%s\n" $foo $bar | git pack-objects $packdir/pack)" &&
+ git prune-packed &&
+
+ test-tool chmtime --get -10 \
+ "$packdir/pack-$foo_pack.pack" >foo.mtime &&
+
+ git repack --cruft -d &&
+
+ cruft="$(ls $packdir/pack-*.mtimes)" &&
+ test-tool pack-mtimes "$(basename $cruft)" >actual &&
+ echo "$foo $(cat foo.mtime)" >expect.raw &&
+ echo "$bar $(cat foo.mtime)" >>expect.raw &&
+ sort expect.raw >expect &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--max-cruft-size with pruning' '
+ git init max-cruft-size-prune &&
+ (
+ cd max-cruft-size-prune &&
+
+ test_commit base &&
+ foo="$(generate_random_blob foo $((1024*1024)))" &&
+ bar="$(generate_random_blob bar $((1024*1024)))" &&
+ baz="$(generate_random_blob baz $((1024*1024)))" &&
+
+ test-tool chmtime -10000 "$objdir/$(test_oid_to_path "$foo")" &&
+
+ git repack -d --cruft --max-cruft-size=1M &&
+
+ # backdate the mtimes of all cruft packs to validate
+ # that they were rewritten as a result of pruning
+ ls $packdir/pack-*.mtimes | sort >cruft.before &&
+ for cruft in $(cat cruft.before)
+ do
+ mtime="$(test-tool chmtime --get -10000 "$cruft")" &&
+ echo $cruft $mtime >>mtimes || return 1
+ done &&
+
+ # repack (and prune) with a --max-cruft-size to ensure
+ # that we appropriately split the resulting set of packs
+ git repack -d --cruft --max-cruft-size=1M \
+ --cruft-expiration=10.seconds.ago &&
+ ls $packdir/pack-*.mtimes | sort >cruft.after &&
+
+ for cruft in $(cat cruft.after)
+ do
+ old_mtime="$(grep $cruft mtimes | cut -d" " -f2)" &&
+ new_mtime="$(test-tool chmtime --get $cruft)" &&
+ test $old_mtime -lt $new_mtime || return 1
+ done &&
+
+ test_line_count = 3 cruft.before &&
+ test_line_count = 2 cruft.after &&
+ test_must_fail git cat-file -e $foo &&
+ git cat-file -e $bar &&
+ git cat-file -e $baz
+ )
+'
+
+test_expect_success '--max-cruft-size ignores non-local packs' '
+ repo="max-cruft-size-non-local" &&
+ git init $repo &&
+ (
+ cd $repo &&
+ test_commit base &&
+ generate_random_blob foo 64 &&
+ git repack --cruft -d
+ ) &&
+
+ git clone --reference=$repo $repo $repo-alt &&
+ (
+ cd $repo-alt &&
+
+ test_commit other &&
+ generate_random_blob bar 64 &&
+
+ # ensure that we do not attempt to pick up packs from
+ # the non-alternated repository, which would result in a
+ # crash
+ git repack --cruft --max-cruft-size=1M -d
+ )
+'
+
test_done
--
2.42.0.310.g9604b54f73.dirty
^ permalink raw reply related
* batch-command wishlist [was: [TOPIC 02/12] Libification Goals and Progress]
From: Eric Wong @ 2023-10-03 0:52 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Taylor Blau
Taylor Blau <me@ttaylorr.com> wrote:
> * (Jonathan Tan) back to process isolation: is the short lifetime of the process
> important?
> * (Taylor Blau) seems like an impossible goal to be able to do multi-command
> executions in a single process, the code is just not designed for it.
-- Split out from https://lore.kernel.org/git/ZRrfN2lbg14IOLiK@nand.local/
Thanks for posting in an accessible format for non-JS/video users.
> * (Junio) is anybody using the `git cat-file --batch-command` mode that switches
> between batch and batch-check.
I've started using --batch-command, but still have to test
backwards compatibility with git 2.35 to ensure users of older
git aren't left out. I still try to support git 1.8.x, even...
But it would be nice if --batch-command grew more functionality:
* ability to add/remove alternates
* ability to specify a preferred alternate for a lookup[1]
* detect unlinked packs/removed repos
Not sure if cat-file is the place for it, but a persistent
process to deal with:
* `git config -f FILENAME ...' (especially --get-urlmatch --type=FOO)
* approxidate parsing for other tools[2]
Would also be nice...
Maybe I'll find the time and patience to implement this... *shrug*
Building C projects is pretty slow for me, even with -O0.
[1] I'm potentially dealing with 50-100k alternates, after all;
and potentially config files in the 300-500k line range...
> * (Patrick Steinhardt) they are longer lived, but only "middle" long-lived.
> GitLab limits the maximum runtime, on the order of ~minutes, at which point
> they are reaped.
> * (Taylor Blau) lots of issues besides memory leaks that would become an issue
> * (Jeff Hostetler) would be nice to keep memory-hungry components pinned across
> multiple command-equivalents.
> * (Taylor Blau): same issue as reading configuration.
sidenote: __attribute__((cleanup)) found in gcc + clang + tinycc
has greatly improved my life. Perhaps only supporting Free software
compilers and cross-compiling for everything else is in order :P
The only gotcha I've noticed with ((cleanup)) is it doesn't fire
on longjmp, but that's not a problem for git.
I've also been abusing more Perl5 over the years as a code
generator and better CPP to make dealing with tedious tasks
more acceptable.
[2] I did recently license the code of a standalone C++ executable
as GPL-2+ so I can copy approxidate parsing from git and
perhaps figure out enough C++ to use Xapian query parser
bindings instead of the hacky `git rev-parse --since=' thing
I do.
^ permalink raw reply
* VENDORS EOI
From: Abu Dhabi National Oil Company @ 2023-10-02 22:07 UTC (permalink / raw)
To: git
Greetings of the day,
We are inviting your esteemed company for vendor registration and
intending partners for Abu Dhabi National Oil Company (ADNOC)
2023/2024 projects.
These projects are open for all companies around the world, if
you have intention to participate in the process, please confirm
your interest by asking for Vendor Questionnaire and EOI.
We appreciate your interest in this invitation, and look forward
to your early response.
Kind Regards,
Mr. Mohamed Ghazi B.
Senior Project Manager
Projects@adnoc-purchase.com
^ permalink raw reply
* Re: What's cooking in git.git (Oct 2023, #01; Mon, 2)
From: Sergey Organov @ 2023-10-03 7:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqedic35u4.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
[...]
> * so/diff-merges-d (2023-09-11) 2 commits
> - diff-merges: introduce '-d' option
> - diff-merges: improve --diff-merges documentation
>
> Teach a new "-d" option that shows the patch against the first
> parent for merge commits (which is "--diff-merges=first-parent -p").
Which happens to naturally mean "show diff for all commits" for the
user.
>
> Letting a less useful combination of options squat on short-and-sweet
> "-d" feels dubious. source:
> <20230909125446.142715-1-sorganov@gmail.com>
I believe I've addressed this in details in my reply here:
<87o7hok8dx.fsf@osv.gnss.ru>, and got no further objections from you
since then, so I figure I'd ask to finally let the patch in.
To summarize my position here, "-d" meaning "show me *d*iff for all
commits", as implemented in the patch, is very mnemonic, has natural
semantics for "-d" in the context of "git log", and is straight to the
point. Therefore it is indeed short-and-sweet compared to the only
alternative proposed: "follow first parent only while traversing history
and show me diffs for all commits", that would indeed need a different
short-cut, if any.
Thanks,
-- Sergey Organov
^ permalink raw reply
* Is git-p4 maintained?
From: Yuri @ 2023-10-03 7:41 UTC (permalink / raw)
To: Git Mailing List
There are problems with the 'git p4 submit' command.
The 'git p4 submit' command first asks many questions like:
Unfortunately applying the change failed!
//xx/xx/xx/xx/xx/xx/xx#92 - was edit, reverted
What do you want to do?
[s]kip this commit but apply the rest, or [q]uit?
The files that it complains about aren't even involved in the commit
that is being pushed.
After all the questions are answered with "s" - git-p4 often fails to
push commits.
Deleted files and added files cause problems during 'git p4 submit' and
when such files are in the commit - 'git p4 submit' prints some error
messages and fails to push commits.
git-p4 says that it can't apply some patches and fails when it is
pushing commits that delete or add files.
The workaround is to place the deleted file back and remove the added
files before the 'git p4 submit' operation.
Is git-p4 maintained?
Is there any chance for these problems to be fixed?
Thanks,
Yuri
^ permalink raw reply
* [PATCH] test-lib: make sure TEST_DIRECTORY has no trailing slash
From: Štěpán Němec @ 2023-10-03 8:23 UTC (permalink / raw)
To: git
Turns out having `pwd` (which TEST_DIRECTORY defaults to) print $PWD
with a trailing slash isn't very difficult, in my case it went something
like
; tmux new-window -c ~/src/git/t/
[in the new window]
; sh ./t0000-basic.sh
PANIC: Running in a /home/stepnem/src/git/t/ that doesn't end in '/t'?
; pwd
/home/stepnem/src/git/t/
(tmux(1) apparently sets PWD in the environment in addition to calling
chdir(2), which seems enough to make at least some shells preserve the
trailing slash in `pwd` output.)
Strip the trailing slash, if present, to prevent bailing out with the
PANIC message in such cases.
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
t/test-lib.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1656c9eed006..3b6f1a17e349 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -35,6 +35,7 @@ else
# needing to exist.
TEST_DIRECTORY=$(cd "$TEST_DIRECTORY" && pwd) || exit 1
fi
+TEST_DIRECTORY="${TEST_DIRECTORY%/}"
if test -z "$TEST_OUTPUT_DIRECTORY"
then
# Similarly, override this to store the test-results subdir
base-commit: d0e8084c65cbf949038ae4cc344ac2c2efd77415
--
2.42.0
^ permalink raw reply related
* [PATCH 2/5] doc/diff-options: improve wording of the log.diffMerges mention
From: Štěpán Němec @ 2023-10-03 8:21 UTC (permalink / raw)
To: git
In-Reply-To: <20231003082107.3002173-1-stepnem@smrk.net>
Fix the grammar ("which default value is") and reword to match other
similar descriptions (say "configuration variable" instead of
"parameter", link to git-config(1)).
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
Documentation/diff-options.txt | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index ee256ec077be..48a5012748dd 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -53,9 +53,9 @@ ifdef::git-log[]
-m:::
This option makes diff output for merge commits to be shown in
the default format. `-m` will produce the output only if `-p`
- is given as well. The default format could be changed using
- `log.diffMerges` configuration parameter, which default value
- is `separate`.
+ is given as well. The default format can be specified using
+ the configuration variable `log.diffMerges` (see
+ linkgit:git-config[1]). It defaults to `separate`.
+
--diff-merges=first-parent:::
--diff-merges=1:::
--
2.42.0
^ permalink raw reply related
* [PATCH 3/5] git-jump: admit to passing merge mode args to ls-files
From: Štěpán Němec @ 2023-10-03 8:21 UTC (permalink / raw)
To: git
In-Reply-To: <20231003082107.3002173-1-stepnem@smrk.net>
There's even an example of such usage in the README.
Fixes: 67ba13e5a4b2 ("git-jump: pass "merge" arguments to ls-files")
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
contrib/git-jump/git-jump | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump
index 40c4b0d11107..47e0c557e63f 100755
--- a/contrib/git-jump/git-jump
+++ b/contrib/git-jump/git-jump
@@ -9,7 +9,7 @@ The <mode> parameter is one of:
diff: elements are diff hunks. Arguments are given to diff.
-merge: elements are merge conflicts. Arguments are ignored.
+merge: elements are merge conflicts. Arguments are given to ls-files -u.
grep: elements are grep hits. Arguments are given to git grep or, if
configured, to the command in `jump.grepCmd`.
--
2.42.0
^ permalink raw reply related
* [PATCH 1/5] Fix some typos, grammar or wording issues in the documentation
From: Štěpán Němec @ 2023-10-03 8:21 UTC (permalink / raw)
To: git
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
Documentation/SubmittingPatches | 10 +++++-----
Documentation/diff-options.txt | 2 +-
Documentation/git-branch.txt | 2 +-
Documentation/git-range-diff.txt | 2 +-
Documentation/git.txt | 2 +-
Documentation/gitattributes.txt | 2 +-
Documentation/giteveryday.txt | 2 +-
contrib/README | 4 ++--
strbuf.h | 8 ++++----
t/README | 30 ++++++++++++++----------------
10 files changed, 31 insertions(+), 33 deletions(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 973d7a81d449..1259549cd488 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -393,8 +393,8 @@ mailing list{security-ml}, instead of the public mailing list.
Learn to use format-patch and send-email if possible. These commands
are optimized for the workflow of sending patches, avoiding many ways
-your existing e-mail client that is optimized for "multipart/*" mime
-type e-mails to corrupt and render your patches unusable.
+your existing e-mail client (often optimized for "multipart/*" MIME
+type e-mails) might render your patches unusable.
People on the Git mailing list need to be able to read and
comment on the changes you are submitting. It is important for
@@ -515,8 +515,8 @@ repositories.
git://git.ozlabs.org/~paulus/gitk
- Those who are interested in improve gitk can volunteer to help Paul
- in maintaining it cf. <YntxL/fTplFm8lr6@cleo>.
+ Those who are interested in improving gitk can volunteer to help Paul
+ maintain it, cf. <YntxL/fTplFm8lr6@cleo>.
- `po/` comes from the localization coordinator, Jiang Xin:
@@ -556,7 +556,7 @@ help you find out who they are.
In any time between the (2)-(3) cycle, the maintainer may pick it up
from the list and queue it to `seen`, in order to make it easier for
-people play with it without having to pick up and apply the patch to
+people to play with it without having to pick up and apply the patch to
their trees themselves.
[[patch-status]]
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 35fae7c87c8a..ee256ec077be 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -301,7 +301,7 @@ ifndef::git-format-patch[]
-z::
ifdef::git-log[]
- Separate the commits with NULs instead of with new newlines.
+ Separate the commits with NULs instead of newlines.
+
Also, when `--raw` or `--numstat` has been given, do not munge
pathnames and use NULs as output field terminators.
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index d207da9101a5..4395aa935438 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -324,7 +324,7 @@ superproject's "origin/main", but tracks the submodule's "origin/main".
multiple times, in which case the last key becomes the primary
key. The keys supported are the same as those in `git
for-each-ref`. Sort order defaults to the value configured for the
- `branch.sort` variable if exists, or to sorting based on the
+ `branch.sort` variable if it exists, or to sorting based on the
full refname (including `refs/...` prefix). This lists
detached HEAD (if present) first, then local branches and
finally remote-tracking branches. See linkgit:git-config[1].
diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
index 0b393715d707..6c728dbe44b2 100644
--- a/Documentation/git-range-diff.txt
+++ b/Documentation/git-range-diff.txt
@@ -166,7 +166,7 @@ A typical output of `git range-diff` would look like this:
In this example, there are 3 old and 3 new commits, where the developer
removed the 3rd, added a new one before the first two, and modified the
-commit message of the 2nd commit as well its diff.
+commit message of the 2nd commit as well as its diff.
When the output goes to a terminal, it is color-coded by default, just
like regular `git diff`'s output. In addition, the first line (adding a
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 11228956cd5e..b246dc86c3f3 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -96,7 +96,7 @@ foo.bar= ...`) sets `foo.bar` to the empty string which `git config
to avoid ambiguity with `<name>` containing one.
+
This is useful for cases where you want to pass transitory
-configuration options to git, but are doing so on OS's where
+configuration options to git, but are doing so on OSes where
other processes might be able to read your cmdline
(e.g. `/proc/self/cmdline`), but not your environ
(e.g. `/proc/self/environ`). That behavior is the default on
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 6deb89a29677..7cdc68f54a5c 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1151,7 +1151,7 @@ will be stored via placeholder `%P`.
^^^^^^^^^^^^^^^^^^^^^^
This attribute controls the length of conflict markers left in
-the work tree file during a conflicted merge. Only setting to
+the work tree file during a conflicted merge. Only setting
the value to a positive integer has any meaningful effect.
For example, this line in `.gitattributes` can be used to tell the merge
diff --git a/Documentation/giteveryday.txt b/Documentation/giteveryday.txt
index faba2ef0881c..f74a33fb35b9 100644
--- a/Documentation/giteveryday.txt
+++ b/Documentation/giteveryday.txt
@@ -229,7 +229,7 @@ without a formal "merging". Or longhand +
git am -3 -k`
An alternate participant submission mechanism is using the
-`git request-pull` or pull-request mechanisms (e.g as used on
+`git request-pull` or pull-request mechanisms (e.g. as used on
GitHub (www.github.com) to notify your upstream of your
contribution.
diff --git a/contrib/README b/contrib/README
index 05f291c1f1d3..d3a327d04074 100644
--- a/contrib/README
+++ b/contrib/README
@@ -24,14 +24,14 @@ lesser degree various foreign SCM interfaces, so you know the
drill.
I expect that things that start their life in the contrib/ area
-to graduate out of contrib/ once they mature, either by becoming
+graduate out of contrib/ once they mature, either by becoming
projects on their own, or moving to the toplevel directory. On
the other hand, I expect I'll be proposing removal of disused
and inactive ones from time to time.
If you have new things to add to this area, please first propose
it on the git mailing list, and after a list discussion proves
-there are some general interests (it does not have to be a
+there is general interest (it does not have to be a
list-wide consensus for a tool targeted to a relatively narrow
audience -- for example I do not work with projects whose
upstream is svn, so I have no use for git-svn myself, but it is
diff --git a/strbuf.h b/strbuf.h
index fd43c46433a4..e959caca876a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -12,9 +12,9 @@
struct string_list;
/**
- * strbuf's are meant to be used with all the usual C string and memory
+ * strbufs are meant to be used with all the usual C string and memory
* APIs. Given that the length of the buffer is known, it's often better to
- * use the mem* functions than a str* one (memchr vs. strchr e.g.).
+ * use the mem* functions than a str* one (e.g., memchr vs. strchr).
* Though, one has to be careful about the fact that str* functions often
* stop on NULs and that strbufs may have embedded NULs.
*
@@ -24,7 +24,7 @@ struct string_list;
* strbufs have some invariants that are very important to keep in mind:
*
* - The `buf` member is never NULL, so it can be used in any usual C
- * string operations safely. strbuf's _have_ to be initialized either by
+ * string operations safely. strbufs _have_ to be initialized either by
* `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
*
* Do *not* assume anything on what `buf` really is (e.g. if it is
@@ -37,7 +37,7 @@ struct string_list;
*
* - The `buf` member is a byte array that has at least `len + 1` bytes
* allocated. The extra byte is used to store a `'\0'`, allowing the
- * `buf` member to be a valid C-string. Every strbuf function ensure this
+ * `buf` member to be a valid C-string. All strbuf functions ensure this
* invariant is preserved.
*
* NOTE: It is OK to "play" with the buffer directly if you work it this
diff --git a/t/README b/t/README
index 61080859899b..a3e87385d065 100644
--- a/t/README
+++ b/t/README
@@ -262,8 +262,8 @@ The argument for --run, <test-selector>, is a list of description
substrings or globs or individual test numbers or ranges with an
optional negation prefix (of '!') that define what tests in a test
suite to include (or exclude, if negated) in the run. A range is two
-numbers separated with a dash and matches a range of tests with both
-ends been included. You may omit the first or the second number to
+numbers separated with a dash and matches an inclusive range of tests
+to run. You may omit the first or the second number to
mean "from the first test" or "up to the very last test" respectively.
The argument to --run is split on commas into separate strings,
@@ -274,10 +274,10 @@ text that you want to match includes a comma, use the glob character
on all tests that match either the glob *rebase* or the glob
*merge?cherry-pick*.
-If --run starts with an unprefixed number or range the initial
-set of tests to run is empty. If the first item starts with '!'
+If --run starts with an unprefixed number or range, the initial
+set of tests to run is empty. If the first item starts with '!',
all the tests are added to the initial set. After initial set is
-determined every test number or range is added or excluded from
+determined, every test number or range is added or excluded from
the set one by one, from left to right.
For example, to run only tests up to a specific test (21), one
@@ -579,11 +579,10 @@ This test harness library does the following things:
Recommended style
-----------------
-Here are some recommented styles when writing test case.
- - Keep test title the same line with test helper function itself.
+ - Keep test titles and helper function invocations on the same line.
- Take test_expect_success helper for example, write it like:
+ For example, with test_expect_success, write it like:
test_expect_success 'test title' '
... test body ...
@@ -595,10 +594,9 @@ Here are some recommented styles when writing test case.
'test title' \
'... test body ...'
+ - End the line with an opening single quote.
- - End the line with a single quote.
-
- - Indent the body of here-document, and use "<<-" instead of "<<"
+ - Indent here-document bodies, and use "<<-" instead of "<<"
to strip leading TABs used for indentation:
test_expect_success 'test something' '
@@ -624,7 +622,7 @@ Here are some recommented styles when writing test case.
'
- Quote or escape the EOF delimiter that begins a here-document if
- there is no parameter and other expansion in it, to signal readers
+ there is no parameter or other expansion in it, to signal readers
that they can skim it more casually:
cmd <<-\EOF
@@ -638,7 +636,7 @@ Do's & don'ts
Here are a few examples of things you probably should and shouldn't do
when writing tests.
-Here are the "do's:"
+The "do's:"
- Put all code inside test_expect_success and other assertions.
@@ -1237,8 +1235,8 @@ and it knows that the object ID of an empty tree is a certain
because the things the very basic core test tries to achieve is
to serve as a basis for people who are changing the Git internals
drastically. For these people, after making certain changes,
-not seeing failures from the basic test _is_ a failure. And
-such drastic changes to the core Git that even changes these
+not seeing failures from the basic test _is_ a failure. Any
+Git core changes so drastic that they change even these
otherwise supposedly stable object IDs should be accompanied by
an update to t0000-basic.sh.
@@ -1248,7 +1246,7 @@ knowledge of the core Git internals. If all the test scripts
hardcoded the object IDs like t0000-basic.sh does, that defeats
the purpose of t0000-basic.sh, which is to isolate that level of
validation in one place. Your test also ends up needing
-updating when such a change to the internal happens, so do _not_
+an update whenever the internals change, so do _not_
do it and leave the low level of validation to t0000-basic.sh.
Test coverage
--
2.42.0
^ permalink raw reply related
* [PATCH 4/5] doc/gitk: s/sticked/stuck/
From: Štěpán Němec @ 2023-10-03 8:21 UTC (permalink / raw)
To: git
In-Reply-To: <20231003082107.3002173-1-stepnem@smrk.net>
The terminology was changed in b0d12fc9b23a (Use the word 'stuck'
instead of 'sticked').
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
Documentation/gitk.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/gitk.txt b/Documentation/gitk.txt
index d50e9ed10e04..c2213bb77b38 100644
--- a/Documentation/gitk.txt
+++ b/Documentation/gitk.txt
@@ -26,7 +26,7 @@ changes each commit introduces are shown. Finally, it supports some
gitk-specific options.
gitk generally only understands options with arguments in the
-'sticked' form (see linkgit:gitcli[7]) due to limitations in the
+'stuck' form (see linkgit:gitcli[7]) due to limitations in the
command-line parser.
rev-list options and arguments
--
2.42.0
^ permalink raw reply related
* [PATCH 5/5] t/README: fix multi-prerequisite example
From: Štěpán Němec @ 2023-10-03 8:21 UTC (permalink / raw)
To: git
In-Reply-To: <20231003082107.3002173-1-stepnem@smrk.net>
With the broken quoting the test wouldn't even parse correctly, but
there's also the '==' instead of POSIX '=' (of the shells I tested,
busybox ash, bash and ksh (93 and OpenBSD) accept '==', dash and zsh do
not), and 'print 2' from Python 2 days.
(I assume the test failing due to 3 != 4 is intentional or immaterial.)
Fixes: 93a572461386 ("test-lib: Add support for multiple test prerequisites")
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
t/README | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/README b/t/README
index a3e87385d065..5b6b8eeb2a5f 100644
--- a/t/README
+++ b/t/README
@@ -886,7 +886,7 @@ see test-lib-functions.sh for the full list and their options.
rare case where your test depends on more than one:
test_expect_success PERL,PYTHON 'yo dawg' \
- ' test $(perl -E 'print eval "1 +" . qx[python -c "print 2"]') == "4" '
+ ' test $(perl -E '\''print eval "1 +" . qx[python -c "print(2)"]'\'') = "4" '
- test_expect_failure [<prereq>] <message> <script>
--
2.42.0
^ permalink raw reply related
* [PATCH] doc/cat-file: clarify description regarding various command forms
From: Štěpán Němec @ 2023-10-03 8:25 UTC (permalink / raw)
To: git; +Cc: avarab
The DESCRIPTION's "first form" is actually the 1st, 2nd, 3rd and 5th
form in SYNOPSIS, the "second form" is the 4th one.
Interestingly, this state of affairs was introduced in
97fe7250753b (cat-file docs: fix SYNOPSIS and "-h" output, 2021-12-28)
with the claim of "Now the two will match again." ("the two" being
DESCRIPTION and SYNOPSIS)...
Ordinals are hard, let's try talking about batch and non-batch mode
instead.
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
Documentation/git-cat-file.txt | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 0e4936d18263..1957f90335a4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -20,12 +20,15 @@ SYNOPSIS
DESCRIPTION
-----------
-In its first form, the command provides the content or the type of an object in
+This command can operate in two modes, depending on whether an option from
+the `--batch` family is specified.
+
+In non-batch mode, the command provides the content or the type of an object in
the repository. The type is required unless `-t` or `-p` is used to find the
object type, or `-s` is used to find the object size, or `--textconv` or
`--filters` is used (which imply type "blob").
-In the second form, a list of objects (separated by linefeeds) is provided on
+In batch mode, a list of objects (separated by linefeeds) is provided on
stdin, and the SHA-1, type, and size of each object is printed on stdout. The
output format can be overridden using the optional `<format>` argument. If
either `--textconv` or `--filters` was specified, the input is expected to
base-commit: d0e8084c65cbf949038ae4cc344ac2c2efd77415
--
2.42.0
^ permalink raw reply related
* Re: [PATCH 2/2] parse-options: use and require int pointer for OPT_CMDMODE
From: René Scharfe @ 2023-10-03 8:49 UTC (permalink / raw)
To: Oswald Buddenhagen; +Cc: Git List, Jeff King, Junio C Hamano
In-Reply-To: <ZQwdsfh1GQX0IOQs@ugly>
Am 21.09.23 um 12:40 schrieb Oswald Buddenhagen:
> On Wed, Sep 20, 2023 at 10:18:10AM +0200, René Scharfe wrote:
>> MSVC warns about all combinations.
>>
> yes, though that's not a problem: after we established that the
> underlying type is int, we can just have a cast in the initializer
> macro.
MSVC does some weird things in general; it's tempting to ignore it.
>>> so how about simply adding a (configure) test to ensure that
>>> there is actually no problem, and calling it a day?
>
>> If we base it on type size then we're making assumptions that I
>> find hard to justify.
>>
> the only one i can think of is signedness. i think this can be safely
> ignored as long as we use only small positive integers.
I don't fully understand the pointer-sign warning, so I'm not
confident enough to silence it.
René
^ permalink raw reply
* Re: [PATCH 2/2] parse-options: use and require int pointer for OPT_CMDMODE
From: René Scharfe @ 2023-10-03 8:49 UTC (permalink / raw)
To: phillip.wood, Junio C Hamano; +Cc: Jeff King, Oswald Buddenhagen, Git List
In-Reply-To: <000ff1b9-e7a5-4dd6-bc61-4b6761f66997@gmail.com>
Am 18.09.23 um 21:48 schrieb Phillip Wood:
> On 18/09/2023 18:11, Junio C Hamano wrote:
>>> Then in builtin/am.c at the top level we'd add
>>>
>>> MAKE_CMDMODE_SETTER(resume_type)
>>>
>>> and change the option definitions to look like
>>>
>>> OPT_CMDMODE(0, "continue", resume_type, &resume.mode, ...)
>>
>> Yup, that is ergonomic and corrects "The shape of a particular enum
>> may not match 'int'" issue nicely. I do not know how severe the
>> problem is that it is not quite type safe that we cannot enforce
>> resume_type is the same as typeof(resume.mode) here, though.
>
> We could use gcc's __builtin_types_compatible_p() if we're prepared to have two definitions of OPT_CMDMODE_F
>
> #if defined(__GNUC__)
> #define OPT_CMDMODE_F(s, l, n, v, h, i, f) { \
> ...
> .defval (i) + \
> BUILD_ASSERT_OR_ZERO(__builtin_types_compatible_p(enum n, __typeof__(v))), \
> }
> #else
> #define OPT_CMDMODE_F(s, l, n, v, h, i, f) { \
> ...
> .defval (i), \
> }
> #endif
>
> Best Wishes
>
> Phillip
That would work, but we can do a type check without a compiler builtin
by creating a function for that purpose. How about this?
--- >8 ---
Subject: [PATCH] parse-options: make OPT_CMDMODE type-safe
Some uses of OPT_CMDMODE point to an enum as value, but the option
parser dereferences it as an int pointer. Depending on the platform,
pointers to int and enums can be incompatible.
Add value accessor functions to struct option to dereference the pointer
safely, use them for PARSE_OPT_CMDMODE error detection and in the new
OPTION_SET_VALUE option type, add a typed version of OPT_CMDMODE named
OPT_CMDMODE_T, make OPT_CMDMODE only accept int and convert enum users
to declare the appropriate type.
The accessor functions for each type must be defined using
DEFINE_OPTION_VALUE_TYPE. OPT_CMDMODE needs the ones for int, so we
define them centrally in parse-options.h.
builtin/am.c::parse_opt_show_current_patch() uses the value pointer just
to calculate the offset of the enclosing struct resume_mode, making the
pointer type inconsequential. Use the correct type instead of int *
regardless, for consistency.
The enum used with OPT_CMDMODE in builtin/replace.c was anonymous. Give
it a name to allow specifying it in DEFINE_OPTION_VALUE_TYPE and then
OPT_CMDMODE_T.
Suggested-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Documentation/technical/api-parse-options.txt | 7 ++++
builtin/am.c | 32 ++++++++++------
builtin/help.c | 37 +++++++++++--------
builtin/ls-tree.c | 18 +++++----
builtin/rebase.c | 33 ++++++++++-------
builtin/replace.c | 37 ++++++++++++-------
builtin/stripspace.c | 14 ++++---
parse-options.c | 20 +++++++++-
parse-options.h | 36 ++++++++++++++++--
9 files changed, 159 insertions(+), 75 deletions(-)
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
index 61fa6ee167..3958ab7c94 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -278,6 +278,13 @@ There are some macros to easily define options:
option has already set its value to the same `int_var`.
In new commands consider using subcommands instead.
+`OPT_CMDMODE_T(short, long, type_name, &enum_var, description, enum_val)`::
+ Same as `OPT_CMDMODE`, but allows specifying an enum variable
+ instead of an int.
+ Requires `DEFINE_OPTION_VALUE_TYPE(type_name, enum_type);` at
+ file scope to declare `type_name`; `enum_type` must be the type
+ of `enum_var`.
+
`OPT_SUBCOMMAND(long, &fn_ptr, subcommand_fn)`::
Define a subcommand. `subcommand_fn` is put into `fn_ptr` when
this subcommand is used.
diff --git a/builtin/am.c b/builtin/am.c
index 6655059a57..4642dc5099 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2268,6 +2268,8 @@ enum resume_type {
RESUME_ALLOW_EMPTY,
};
+DEFINE_OPTION_VALUE_TYPE(resume_type, enum resume_type);
+
struct resume_mode {
enum resume_type mode;
enum show_patch_type sub_mode;
@@ -2275,7 +2277,7 @@ struct resume_mode {
static int parse_opt_show_current_patch(const struct option *opt, const char *arg, int unset)
{
- int *opt_value = opt->value;
+ enum resume_type *opt_value = opt->value;
struct resume_mode *resume = container_of(opt_value, struct resume_mode, mode);
/*
@@ -2388,27 +2390,33 @@ int cmd_am(int argc, const char **argv, const char *prefix)
PARSE_OPT_NOARG),
OPT_STRING(0, "resolvemsg", &state.resolvemsg, NULL,
N_("override error message when patch failure occurs")),
- OPT_CMDMODE(0, "continue", &resume.mode,
+ OPT_CMDMODE_T(0, "continue", resume_type, &resume.mode,
N_("continue applying patches after resolving a conflict"),
RESUME_RESOLVED),
- OPT_CMDMODE('r', "resolved", &resume.mode,
+ OPT_CMDMODE_T('r', "resolved", resume_type, &resume.mode,
N_("synonyms for --continue"),
RESUME_RESOLVED),
- OPT_CMDMODE(0, "skip", &resume.mode,
+ OPT_CMDMODE_T(0, "skip", resume_type, &resume.mode,
N_("skip the current patch"),
RESUME_SKIP),
- OPT_CMDMODE(0, "abort", &resume.mode,
+ OPT_CMDMODE_T(0, "abort", resume_type, &resume.mode,
N_("restore the original branch and abort the patching operation"),
RESUME_ABORT),
- OPT_CMDMODE(0, "quit", &resume.mode,
+ OPT_CMDMODE_T(0, "quit", resume_type, &resume.mode,
N_("abort the patching operation but keep HEAD where it is"),
RESUME_QUIT),
- { OPTION_CALLBACK, 0, "show-current-patch", &resume.mode,
- "(diff|raw)",
- N_("show the patch being applied"),
- PARSE_OPT_CMDMODE | PARSE_OPT_OPTARG | PARSE_OPT_NONEG | PARSE_OPT_LITERAL_ARGHELP,
- parse_opt_show_current_patch, RESUME_SHOW_PATCH },
- OPT_CMDMODE(0, "allow-empty", &resume.mode,
+ {
+ .type = OPTION_CALLBACK,
+ .long_name = "show-current-patch",
+ OPTION_VALUE(resume_type, &resume.mode),
+ .argh = "(diff|raw)",
+ .help = N_("show the patch being applied"),
+ .flags = PARSE_OPT_CMDMODE | PARSE_OPT_OPTARG |
+ PARSE_OPT_NONEG | PARSE_OPT_LITERAL_ARGHELP,
+ .callback = parse_opt_show_current_patch,
+ .defval = RESUME_SHOW_PATCH,
+ },
+ OPT_CMDMODE_T(0, "allow-empty", resume_type, &resume.mode,
N_("record the empty patch as an empty commit"),
RESUME_ALLOW_EMPTY),
OPT_BOOL(0, "committer-date-is-author-date",
diff --git a/builtin/help.c b/builtin/help.c
index dc1fbe2b98..48437c8636 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -52,6 +52,8 @@ static enum help_action {
HELP_ACTION_CONFIG_SECTIONS_FOR_COMPLETION,
} cmd_mode;
+DEFINE_OPTION_VALUE_TYPE(help_action, enum help_action);
+
static const char *html_path;
static int verbose = 1;
static enum help_format help_format = HELP_FORMAT_NONE;
@@ -59,8 +61,8 @@ static int exclude_guides;
static int show_external_commands = -1;
static int show_aliases = -1;
static struct option builtin_help_options[] = {
- OPT_CMDMODE('a', "all", &cmd_mode, N_("print all available commands"),
- HELP_ACTION_ALL),
+ OPT_CMDMODE_T('a', "all", help_action, &cmd_mode,
+ N_("print all available commands"), HELP_ACTION_ALL),
OPT_BOOL(0, "external-commands", &show_external_commands,
N_("show external commands in --all")),
OPT_BOOL(0, "aliases", &show_aliases, N_("show aliases in --all")),
@@ -72,20 +74,23 @@ static struct option builtin_help_options[] = {
HELP_FORMAT_INFO),
OPT__VERBOSE(&verbose, N_("print command description")),
- OPT_CMDMODE('g', "guides", &cmd_mode, N_("print list of useful guides"),
- HELP_ACTION_GUIDES),
- OPT_CMDMODE(0, "user-interfaces", &cmd_mode,
- N_("print list of user-facing repository, command and file interfaces"),
- HELP_ACTION_USER_INTERFACES),
- OPT_CMDMODE(0, "developer-interfaces", &cmd_mode,
- N_("print list of file formats, protocols and other developer interfaces"),
- HELP_ACTION_DEVELOPER_INTERFACES),
- OPT_CMDMODE('c', "config", &cmd_mode, N_("print all configuration variable names"),
- HELP_ACTION_CONFIG),
- OPT_CMDMODE_F(0, "config-for-completion", &cmd_mode, "",
- HELP_ACTION_CONFIG_FOR_COMPLETION, PARSE_OPT_HIDDEN),
- OPT_CMDMODE_F(0, "config-sections-for-completion", &cmd_mode, "",
- HELP_ACTION_CONFIG_SECTIONS_FOR_COMPLETION, PARSE_OPT_HIDDEN),
+ OPT_CMDMODE_T('g', "guides", help_action, &cmd_mode,
+ N_("print list of useful guides"), HELP_ACTION_GUIDES),
+ OPT_CMDMODE_T(0, "user-interfaces", help_action, &cmd_mode,
+ N_("print list of user-facing repository, command and file interfaces"),
+ HELP_ACTION_USER_INTERFACES),
+ OPT_CMDMODE_T(0, "developer-interfaces", help_action, &cmd_mode,
+ N_("print list of file formats, protocols and other developer interfaces"),
+ HELP_ACTION_DEVELOPER_INTERFACES),
+ OPT_CMDMODE_T('c', "config", help_action, &cmd_mode,
+ N_("print all configuration variable names"),
+ HELP_ACTION_CONFIG),
+ OPT_CMDMODE_T_F(0, "config-for-completion", help_action, &cmd_mode, "",
+ HELP_ACTION_CONFIG_FOR_COMPLETION, PARSE_OPT_HIDDEN),
+ OPT_CMDMODE_T_F(0, "config-sections-for-completion",
+ help_action, &cmd_mode, "",
+ HELP_ACTION_CONFIG_SECTIONS_FOR_COMPLETION,
+ PARSE_OPT_HIDDEN),
OPT_END(),
};
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 209d2dc0d5..2e7faf510d 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -339,6 +339,8 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
},
};
+DEFINE_OPTION_VALUE_TYPE(cmdmode, enum ls_tree_cmdmode);
+
int cmd_ls_tree(int argc, const char **argv, const char *prefix)
{
struct object_id oid;
@@ -358,14 +360,14 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
LS_SHOW_TREES),
OPT_BOOL('z', NULL, &null_termination,
N_("terminate entries with NUL byte")),
- OPT_CMDMODE('l', "long", &cmdmode, N_("include object size"),
- MODE_LONG),
- OPT_CMDMODE(0, "name-only", &cmdmode, N_("list only filenames"),
- MODE_NAME_ONLY),
- OPT_CMDMODE(0, "name-status", &cmdmode, N_("list only filenames"),
- MODE_NAME_STATUS),
- OPT_CMDMODE(0, "object-only", &cmdmode, N_("list only objects"),
- MODE_OBJECT_ONLY),
+ OPT_CMDMODE_T('l', "long", cmdmode, &cmdmode,
+ N_("include object size"), MODE_LONG),
+ OPT_CMDMODE_T(0, "name-only", cmdmode, &cmdmode,
+ N_("list only filenames"), MODE_NAME_ONLY),
+ OPT_CMDMODE_T(0, "name-status", cmdmode, &cmdmode,
+ N_("list only filenames"), MODE_NAME_STATUS),
+ OPT_CMDMODE_T(0, "object-only", cmdmode, &cmdmode,
+ N_("list only objects"), MODE_OBJECT_ONLY),
OPT_BOOL(0, "full-name", &full_name, N_("use full path names")),
OPT_BOOL(0, "full-tree", &full_tree,
N_("list entire tree; not just current directory "
diff --git a/builtin/rebase.c b/builtin/rebase.c
index ed15accec9..50739327bd 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -75,6 +75,8 @@ enum action {
ACTION_SHOW_CURRENT_PATCH
};
+DEFINE_OPTION_VALUE_TYPE(action, enum action);
+
static const char *action_names[] = {
"undefined",
"continue",
@@ -1116,20 +1118,23 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
OPT_BIT(0, "no-ff", &options.flags,
N_("cherry-pick all commits, even if unchanged"),
REBASE_FORCE),
- OPT_CMDMODE(0, "continue", &options.action, N_("continue"),
- ACTION_CONTINUE),
- OPT_CMDMODE(0, "skip", &options.action,
- N_("skip current patch and continue"), ACTION_SKIP),
- OPT_CMDMODE(0, "abort", &options.action,
- N_("abort and check out the original branch"),
- ACTION_ABORT),
- OPT_CMDMODE(0, "quit", &options.action,
- N_("abort but keep HEAD where it is"), ACTION_QUIT),
- OPT_CMDMODE(0, "edit-todo", &options.action, N_("edit the todo list "
- "during an interactive rebase"), ACTION_EDIT_TODO),
- OPT_CMDMODE(0, "show-current-patch", &options.action,
- N_("show the patch file being applied or merged"),
- ACTION_SHOW_CURRENT_PATCH),
+ OPT_CMDMODE_T(0, "continue", action, &options.action,
+ N_("continue"), ACTION_CONTINUE),
+ OPT_CMDMODE_T(0, "skip", action, &options.action,
+ N_("skip current patch and continue"),
+ ACTION_SKIP),
+ OPT_CMDMODE_T(0, "abort", action, &options.action,
+ N_("abort and check out the original branch"),
+ ACTION_ABORT),
+ OPT_CMDMODE_T(0, "quit", action, &options.action,
+ N_("abort but keep HEAD where it is"),
+ ACTION_QUIT),
+ OPT_CMDMODE_T(0, "edit-todo", action, &options.action,
+ N_("edit the todo list during an interactive rebase"),
+ ACTION_EDIT_TODO),
+ OPT_CMDMODE_T(0, "show-current-patch", action, &options.action,
+ N_("show the patch file being applied or merged"),
+ ACTION_SHOW_CURRENT_PATCH),
OPT_CALLBACK_F(0, "apply", &options, NULL,
N_("use apply strategies to rebase"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG,
diff --git a/builtin/replace.c b/builtin/replace.c
index da59600ad2..09aebc9de2 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -539,27 +539,36 @@ static int convert_graft_file(int force)
return -1;
}
+enum cmdmode {
+ MODE_UNSPECIFIED = 0,
+ MODE_LIST,
+ MODE_DELETE,
+ MODE_EDIT,
+ MODE_GRAFT,
+ MODE_CONVERT_GRAFT_FILE,
+ MODE_REPLACE
+};
+
+DEFINE_OPTION_VALUE_TYPE(cmdmode, enum cmdmode);
int cmd_replace(int argc, const char **argv, const char *prefix)
{
int force = 0;
int raw = 0;
const char *format = NULL;
- enum {
- MODE_UNSPECIFIED = 0,
- MODE_LIST,
- MODE_DELETE,
- MODE_EDIT,
- MODE_GRAFT,
- MODE_CONVERT_GRAFT_FILE,
- MODE_REPLACE
- } cmdmode = MODE_UNSPECIFIED;
+ enum cmdmode cmdmode = MODE_UNSPECIFIED;
struct option options[] = {
- OPT_CMDMODE('l', "list", &cmdmode, N_("list replace refs"), MODE_LIST),
- OPT_CMDMODE('d', "delete", &cmdmode, N_("delete replace refs"), MODE_DELETE),
- OPT_CMDMODE('e', "edit", &cmdmode, N_("edit existing object"), MODE_EDIT),
- OPT_CMDMODE('g', "graft", &cmdmode, N_("change a commit's parents"), MODE_GRAFT),
- OPT_CMDMODE(0, "convert-graft-file", &cmdmode, N_("convert existing graft file"), MODE_CONVERT_GRAFT_FILE),
+ OPT_CMDMODE_T('l', "list", cmdmode, &cmdmode,
+ N_("list replace refs"), MODE_LIST),
+ OPT_CMDMODE_T('d', "delete", cmdmode, &cmdmode,
+ N_("delete replace refs"), MODE_DELETE),
+ OPT_CMDMODE_T('e', "edit", cmdmode, &cmdmode,
+ N_("edit existing object"), MODE_EDIT),
+ OPT_CMDMODE_T('g', "graft", cmdmode, &cmdmode,
+ N_("change a commit's parents"), MODE_GRAFT),
+ OPT_CMDMODE_T(0, "convert-graft-file", cmdmode, &cmdmode,
+ N_("convert existing graft file"),
+ MODE_CONVERT_GRAFT_FILE),
OPT_BOOL_F('f', "force", &force, N_("replace the ref if it exists"),
PARSE_OPT_NOCOMPLETE),
OPT_BOOL(0, "raw", &raw, N_("do not pretty-print contents for --edit")),
diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index 7b700a9fb1..82b9a95a4f 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -29,6 +29,8 @@ enum stripspace_mode {
COMMENT_LINES
};
+DEFINE_OPTION_VALUE_TYPE(mode, enum stripspace_mode);
+
int cmd_stripspace(int argc, const char **argv, const char *prefix)
{
struct strbuf buf = STRBUF_INIT;
@@ -36,12 +38,12 @@ int cmd_stripspace(int argc, const char **argv, const char *prefix)
int nongit;
const struct option options[] = {
- OPT_CMDMODE('s', "strip-comments", &mode,
- N_("skip and remove all lines starting with comment character"),
- STRIP_COMMENTS),
- OPT_CMDMODE('c', "comment-lines", &mode,
- N_("prepend comment character and space to each line"),
- COMMENT_LINES),
+ OPT_CMDMODE_T('s', "strip-comments", mode, &mode,
+ N_("skip and remove all lines starting with comment character"),
+ STRIP_COMMENTS),
+ OPT_CMDMODE_T('c', "comment-lines", mode, &mode,
+ N_("prepend comment character and space to each line"),
+ COMMENT_LINES),
OPT_END()
};
diff --git a/parse-options.c b/parse-options.c
index e8e076c3a6..63a2247128 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -85,7 +85,7 @@ static enum parse_opt_result opt_command_mode_error(
if (that == opt ||
!(that->flags & PARSE_OPT_CMDMODE) ||
that->value != opt->value ||
- that->defval != *(int *)opt->value)
+ that->defval != opt->get_value(opt->value))
continue;
if (that->long_name)
@@ -122,7 +122,8 @@ static enum parse_opt_result get_value(struct parse_opt_ctx_t *p,
* is not a grave error, so let it pass.
*/
if ((opt->flags & PARSE_OPT_CMDMODE) &&
- *(int *)opt->value && *(int *)opt->value != opt->defval)
+ opt->get_value(opt->value) &&
+ opt->get_value(opt->value) != opt->defval)
return opt_command_mode_error(opt, all_opts, flags);
switch (opt->type) {
@@ -160,6 +161,10 @@ static enum parse_opt_result get_value(struct parse_opt_ctx_t *p,
*(int *)opt->value = unset ? 0 : opt->defval;
return 0;
+ case OPTION_SET_VALUE:
+ opt->set_value(opt->value, unset ? 0 : opt->defval);
+ return 0;
+
case OPTION_STRING:
if (unset)
*(const char **)opt->value = NULL;
@@ -483,11 +488,21 @@ static void parse_options_check(const struct option *opts)
if (opts->type == OPTION_SET_INT && !opts->defval &&
opts->long_name && !(opts->flags & PARSE_OPT_NONEG))
optbug(opts, "OPTION_SET_INT 0 should not be negatable");
+ if (opts->type == OPTION_SET_VALUE && !opts->defval &&
+ opts->long_name && !(opts->flags & PARSE_OPT_NONEG))
+ optbug(opts, "OPTION_SET_VALUE 0 should not be negatable");
+ if (opts->type == OPTION_SET_VALUE &&
+ (!opts->get_value || !opts->set_value))
+ optbug(opts, "OPTION_SET_VALUE requires accessors");
+ if ((opts->flags & PARSE_OPT_CMDMODE) &&
+ (!opts->get_value || !opts->set_value))
+ optbug(opts, "PARSE_OPT_CMDMODE requires accessors");
switch (opts->type) {
case OPTION_COUNTUP:
case OPTION_BIT:
case OPTION_NEGBIT:
case OPTION_SET_INT:
+ case OPTION_SET_VALUE:
case OPTION_NUMBER:
if ((opts->flags & PARSE_OPT_OPTARG) ||
!(opts->flags & PARSE_OPT_NOARG))
@@ -611,6 +626,7 @@ static void show_negated_gitcomp(const struct option *opts, int show_all,
case OPTION_NEGBIT:
case OPTION_COUNTUP:
case OPTION_SET_INT:
+ case OPTION_SET_VALUE:
has_unset_form = 1;
break;
default:
diff --git a/parse-options.h b/parse-options.h
index 57a7fe9d91..764e7f7896 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -20,6 +20,7 @@ enum parse_opt_type {
OPTION_BITOP,
OPTION_COUNTUP,
OPTION_SET_INT,
+ OPTION_SET_VALUE,
/* options with arguments (usually) */
OPTION_STRING,
OPTION_INTEGER,
@@ -158,8 +159,34 @@ struct option {
parse_opt_ll_cb *ll_callback;
intptr_t extra;
parse_opt_subcommand_fn *subcommand_fn;
+ intptr_t (*get_value)(void *);
+ void (*set_value)(void *, intptr_t);
};
+#define DEFINE_OPTION_VALUE_TYPE(type_name, type) \
+static inline intptr_t type_name##__get(void *void_ptr) \
+{ \
+ type *ptr = void_ptr; \
+ return (intptr_t)*ptr; \
+} \
+static inline void type_name##__set(void *void_ptr, intptr_t value) \
+{ \
+ type *ptr = void_ptr; \
+ *ptr = (type)value; \
+} \
+static inline void *type_name##__check(type *ptr) \
+{ \
+ return ptr; \
+} \
+static inline void *type_name##__check(type *ptr)
+
+DEFINE_OPTION_VALUE_TYPE(int, int);
+
+#define OPTION_VALUE(type_name, v) \
+ .get_value = type_name##__get, \
+ .set_value = type_name##__set, \
+ .value = (1 ? (v) : type_name##__check(v))
+
#define OPT_BIT_F(s, l, v, h, b, f) { \
.type = OPTION_BIT, \
.short_name = (s), \
@@ -256,15 +283,18 @@ struct option {
.flags = PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, \
.defval = 1, \
}
-#define OPT_CMDMODE_F(s, l, v, h, i, f) { \
- .type = OPTION_SET_INT, \
+
+#define OPT_CMDMODE_T_F(s, l, t, v, h, i, f) { \
+ .type = OPTION_SET_VALUE, \
.short_name = (s), \
.long_name = (l), \
- .value = (v), \
+ OPTION_VALUE(t, (v)), \
.help = (h), \
.flags = PARSE_OPT_CMDMODE|PARSE_OPT_NOARG|PARSE_OPT_NONEG | (f), \
.defval = (i), \
}
+#define OPT_CMDMODE_T(s, l, t, v, h, i) OPT_CMDMODE_T_F(s, l, t, v, h, i, 0)
+#define OPT_CMDMODE_F(s, l, v, h, i, f) OPT_CMDMODE_T_F(s, l, int, v, h, i, f)
#define OPT_CMDMODE(s, l, v, h, i) OPT_CMDMODE_F(s, l, v, h, i, 0)
#define OPT_INTEGER(s, l, v, h) OPT_INTEGER_F(s, l, v, h, 0)
--
2.42.0
^ permalink raw reply related
* [PATCH] parse-options: drop unused parse_opt_ctx_t member
From: René Scharfe @ 2023-10-03 8:55 UTC (permalink / raw)
To: Git List
5c387428f1 (parse-options: don't emit "ambiguous option" for aliases,
2019-04-29) added "updated_options" to struct parse_opt_ctx_t, but it
has never been used. Remove it.
---
parse-options.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/parse-options.h b/parse-options.h
index 57a7fe9d91..4a66ec3bf5 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -459,7 +459,6 @@ struct parse_opt_ctx_t {
unsigned has_subcommands;
const char *prefix;
const char **alias_groups; /* must be in groups of 3 elements! */
- struct option *updated_options;
};
void parse_options_start(struct parse_opt_ctx_t *ctx,
--
2.42.0
^ permalink raw reply related
* Re: [PATCH 2/2] parse-options: use and require int pointer for OPT_CMDMODE
From: Oswald Buddenhagen @ 2023-10-03 9:38 UTC (permalink / raw)
To: René Scharfe; +Cc: Git List, Jeff King, Junio C Hamano
In-Reply-To: <d9defed8-4e7e-4b84-be3d-57155d973320@web.de>
On Tue, Oct 03, 2023 at 10:49:12AM +0200, René Scharfe wrote:
>Am 21.09.23 um 12:40 schrieb Oswald Buddenhagen:
>> On Wed, Sep 20, 2023 at 10:18:10AM +0200, René Scharfe wrote:
>>> If we base it on type size then we're making assumptions that I
>>> find hard to justify.
>>>
>> the only one i can think of is signedness. i think this can be safely
>> ignored as long as we use only small positive integers.
>
>I don't fully understand the pointer-sign warning, so I'm not
>confident enough to silence it.
>
in theory, differently signed integers may have completely different
binary representations. but afaik, that only ever mattered for negative
numbers. and c++20 actually codifies two's complement, which was the
de-facto standard for decades already.
so in practice it just means that we may be assigning a value that is
outside the range of the actual type. but small positive values are
compatible between signed and unsiged types.
regards
^ permalink raw reply
* [Outreachy] Introduction and Interest in Contributing to the Git Community
From: Isoken Ibizugbe @ 2023-10-03 10:19 UTC (permalink / raw)
To: git
Dear Git Community,
I hope this email finds you well. My name is Isoken Ibizugbe, and I am
writing to express my strong interest in joining the Git community as
a contributor to the FOSS project. I recently came across the project
description regarding "Moving existing tests to a unit testing
framework" and was particularly intrigued by the opportunity to be
part of this exciting endeavor.
As an aspiring software engineer, I have always admired the incredible
work done by the Git community in developing and maintaining this
widely-used version control system. The project's commitment to
fostering collaboration and innovation aligns perfectly with my values
and aspirations as a developer.
I understand that Christian Couder is the mentor for this project, and
I would be honored to have the opportunity to work under his guidance
and expertise. I would greatly appreciate any advice or direction he
can provide to help me get started on this journey.
I am eager to learn, collaborate with the community, and contribute
meaningfully to this project. Please let me know how I can formally
start my journey as a Git contributor and if there are any specific
guidelines or resources that you recommend for newcomers, as it was a
bit confusing process for me to join this mailing list.
Once again, thank you for your time and for providing an opportunity
for individuals like me to contribute to this remarkable project. I am
enthusiastic about the potential of this project and the journey
ahead.
Best regards,
Isoken
^ permalink raw reply
* [Outreachy] Move existing tests to a unit testing framework
From: Luma @ 2023-10-03 14:30 UTC (permalink / raw)
To: git
Hi;
My name is Luma, and I wanted to take a moment to introduce myself
and share some
insights on an essential aspect of avoiding pipes in git related
commands in test scripts.
I am an outreachy applicant for the December 2023 cohort and look
forward to learning from you.
One common practice in shell scripting is the use of pipes (|) to
chain multiple commands together.
While pipes are incredibly versatile and useful, excessive use of
them can lead to script complexity.
I plan to avoid overusing pipes in test scripts by: leveraging command
options, using temporary files
and using functions and variables to break down complex pipelines.
If you have any questions on pipes, I'm always here to learn and share
knowledge.
Best regards,
Luma.
^ permalink raw reply
* Re: batch-command wishlist [was: [TOPIC 02/12] Libification Goals and Progress]
From: Junio C Hamano @ 2023-10-03 16:15 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Taylor Blau
In-Reply-To: <20231003005251.M353509@dcvr>
Eric Wong <e@80x24.org> writes:
> Taylor Blau <me@ttaylorr.com> wrote:
>> * (Jonathan Tan) back to process isolation: is the short lifetime of the process
>> important?
>> * (Taylor Blau) seems like an impossible goal to be able to do multi-command
>> executions in a single process, the code is just not designed for it.
>
> -- Split out from https://lore.kernel.org/git/ZRrfN2lbg14IOLiK@nand.local/
> Thanks for posting in an accessible format for non-JS/video users.
>
>> * (Junio) is anybody using the `git cat-file --batch-command` mode that switches
>> between batch and batch-check.
This is not exactly what I asked about---I was asking about the use
of the "a long living process serves many requests" pattern ;-)
> But it would be nice if --batch-command grew more functionality:
>
> * ability to add/remove alternates
> * ability to specify a preferred alternate for a lookup[1]
> * detect unlinked packs/removed repos
To the third you would also want to notice an updated index, too.
> Not sure if cat-file is the place for it, but a persistent
> process to deal with:
>
> * `git config -f FILENAME ...' (especially --get-urlmatch --type=FOO)
> * approxidate parsing for other tools[2]
>
> Would also be nice...
"git daemon" ;-)?
^ permalink raw reply
* [PATCH] builtin/repack.c: avoid making cruft packs preferred
From: Taylor Blau @ 2023-10-03 16:27 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt
When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.
But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.
In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.
If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.
Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.
Note that this behavior is usually just a performance regression. But
it's possible it could be a correctness issue.
Suppose an object was duplicated among the cruft and non-cruft pack. The
MIDX will pick the one from the pack with the lowest mtime, which will
always be the cruft one. But if the non-cruft pack happens to sort
earlier in lexical order, we'll treat that one as preferred, but not all
duplicates will be resolved in favor of that pack.
So if we happened to have an object which appears in both packs
(e.g., due to a cruft object being freshened, causing it to appear
loose, and then repacking it via the `--geometric` repack) it's possible
the duplicate would be picked from the non-preferred pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
I've had this sitting in my patch queue for a while now. It's a
non-critical performance fix that avoids the repack/MIDX machinery from
ever choosing a cruft pack as preferred when writing a MIDX bitmap
without a given --preferred-pack.
There is no correctness issue here, but choosing a pack with few/no
reachable objects means that our pack reuse mechanism will rarely kick
in, resulting in performance degradation.
builtin/repack.c | 47 ++++++++++++++++++++++++++++++++++++++++-
t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 04770b15fe..a1a893d952 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -355,6 +355,18 @@ static struct generated_pack_data *populate_pack_exts(const char *name)
return data;
}
+static int has_pack_ext(const struct generated_pack_data *data,
+ const char *ext)
+{
+ int i;
+ for (i = 0; i < ARRAY_SIZE(exts); i++) {
+ if (strcmp(exts[i].name, ext))
+ continue;
+ return !!data->tempfiles[i];
+ }
+ BUG("unknown pack extension: '%s'", ext);
+}
+
static void repack_promisor_objects(const struct pack_objects_args *args,
struct string_list *names)
{
@@ -772,6 +784,7 @@ static void midx_included_packs(struct string_list *include,
static int write_midx_included_packs(struct string_list *include,
struct pack_geometry *geometry,
+ struct string_list *names,
const char *refs_snapshot,
int show_progress, int write_bitmaps)
{
@@ -801,6 +814,38 @@ static int write_midx_included_packs(struct string_list *include,
if (preferred)
strvec_pushf(&cmd.args, "--preferred-pack=%s",
pack_basename(preferred));
+ else if (names->nr) {
+ /* The largest pack was repacked, meaning that either
+ * one or two packs exist depending on whether the
+ * repository has a cruft pack or not.
+ *
+ * Select the non-cruft one as preferred to encourage
+ * pack-reuse among packs containing reachable objects
+ * over unreachable ones.
+ *
+ * (Note we could write multiple packs here if
+ * `--max-pack-size` was given, but any one of them
+ * will suffice, so pick the first one.)
+ */
+ for_each_string_list_item(item, names) {
+ struct generated_pack_data *data = item->util;
+ if (has_pack_ext(data, ".mtimes"))
+ continue;
+
+ strvec_pushf(&cmd.args, "--preferred-pack=pack-%s.pack",
+ item->string);
+ break;
+ }
+ } else {
+ /*
+ * No packs were kept, and no packs were written. The
+ * only thing remaining are .keep packs (unless
+ * --pack-kept-objects was given).
+ *
+ * Set the `--preferred-pack` arbitrarily here.
+ */
+ ;
+ }
if (refs_snapshot)
strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
@@ -1360,7 +1405,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
struct string_list include = STRING_LIST_INIT_NODUP;
midx_included_packs(&include, &existing, &names, &geometry);
- ret = write_midx_included_packs(&include, &geometry,
+ ret = write_midx_included_packs(&include, &geometry, &names,
refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
show_progress, write_bitmaps > 0);
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index dc86ca8269..be3735dff0 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -372,4 +372,43 @@ test_expect_success '--max-cruft-size ignores non-local packs' '
)
'
+test_expect_success 'reachable packs are preferred over cruft ones' '
+ repo="cruft-preferred-packs" &&
+ git init "$repo" &&
+ (
+ cd "$repo" &&
+
+ # This test needs to exercise careful control over when a MIDX
+ # is and is not written. Unset the corresponding TEST variable
+ # accordingly.
+ sane_unset GIT_TEST_MULTI_PACK_INDEX &&
+
+ test_commit base &&
+ test_commit --no-tag cruft &&
+
+ non_cruft="$(echo base | git pack-objects --revs $packdir/pack)" &&
+ # Write a cruft pack which both (a) sorts ahead of the non-cruft
+ # pack in lexical order, and (b) has an older mtime to appease
+ # the MIDX preferred pack selection routine.
+ cruft="$(echo pack-$non_cruft.pack | git pack-objects --cruft $packdir/pack-A)" &&
+ test-tool chmtime -1000 $packdir/pack-A-$cruft.pack &&
+
+ test_commit other &&
+ git repack -d &&
+
+ git repack --geometric 2 -d --write-midx --write-bitmap-index &&
+
+ # After repacking, there are two packs left: one reachable one
+ # (which is the result of combining both of the existing two
+ # non-cruft packs), and one cruft pack.
+ find .git/objects/pack -type f -name "*.pack" >packs &&
+ test_line_count = 2 packs &&
+
+ # Make sure that the pack we just wrote is marked as preferred,
+ # not the cruft one.
+ pack="$(test-tool read-midx --preferred-pack $objdir)" &&
+ test_path_is_missing "$packdir/$(basename "$pack" ".idx").mtimes"
+ )
+'
+
test_done
--
2.42.0.311.ga7f7a0e966.dirty
^ permalink raw reply related
* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
From: Taylor Blau @ 2023-10-03 16:30 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <19d9aae08eab05c6b5dda4c2090236b1c3f62998.1696349955.git.me@ttaylorr.com>
On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
> I've had this sitting in my patch queue for a while now. It's a
> non-critical performance fix that avoids the repack/MIDX machinery from
> ever choosing a cruft pack as preferred when writing a MIDX bitmap
> without a given --preferred-pack.
>
> There is no correctness issue here, but choosing a pack with few/no
> reachable objects means that our pack reuse mechanism will rarely kick
> in, resulting in performance degradation.
>
> builtin/repack.c | 47 ++++++++++++++++++++++++++++++++++++++++-
> t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 1 deletion(-)
Oops, I should have mentioned that this is meant to be applied on top of
'tb/multi-cruft-pack' to reduce the conflict resolution burden. Sorry
about that.
Thanks,
Taylor
^ permalink raw reply
* [PATCH v2] git-status.txt: fix minor asciidoc format issue
From: cousteau via GitGitGadget @ 2023-10-03 16:33 UTC (permalink / raw)
To: git; +Cc: Javier Mora, cousteau, Javier Mora
In-Reply-To: <pull.1591.git.1696193527923.gitgitgadget@gmail.com>
From: Javier Mora <cousteaulecommandant@gmail.com>
The paragraph below the list of short option combinations
isn't correctly formatted, making the result hard to read.
Signed-off-by: Javier Mora <cousteaulecommandant@gmail.com>
---
git-status.txt: minor asciidoc format correction
The paragraph below the list of short option combinations was hard to
read; turns out it wasn't correctly formatted in asciidoc.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1591%2Fcousteaulecommandant%2Fman-git-status-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1591/cousteaulecommandant/man-git-status-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1591
Range-diff vs v1:
1: b3c97ca9e0f ! 1: 811885a275f git-status.txt: fix minor asciidoc format issue
@@ Commit message
## Documentation/git-status.txt ##
@@ Documentation/git-status.txt: U U unmerged, both modified
- ....
Submodules have more state and instead report
+
- M the submodule has a different HEAD than
- recorded in the index
- m the submodule has modified content
- ? the submodule has untracked files
-+
+* 'M' = the submodule has a different HEAD than recorded in the index
+* 'm' = the submodule has modified content
+* '?' = the submodule has untracked files
-+
+
since modified content or untracked files in a submodule cannot be added
via `git add` in the superproject to prepare a commit.
-
Documentation/git-status.txt | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index b27d127b5e2..48f46eb2047 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -246,10 +246,9 @@ U U unmerged, both modified
Submodules have more state and instead report
- M the submodule has a different HEAD than
- recorded in the index
- m the submodule has modified content
- ? the submodule has untracked files
+* 'M' = the submodule has a different HEAD than recorded in the index
+* 'm' = the submodule has modified content
+* '?' = the submodule has untracked files
since modified content or untracked files in a submodule cannot be added
via `git add` in the superproject to prepare a commit.
base-commit: d0e8084c65cbf949038ae4cc344ac2c2efd77415
--
gitgitgadget
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox