* [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks @ 2024-11-08 17:31 Calvin Wan 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan 2024-11-11 4:50 ` [RFC PATCH 0/1] " Junio C Hamano 0 siblings, 2 replies; 13+ messages in thread From: Calvin Wan @ 2024-11-08 17:31 UTC (permalink / raw) To: git; +Cc: Calvin Wan, steamdon, emrass, ps, me, stolee Unless a user changes the config options `gc.autoDetach` or `maintenance.autoDetach`, Git by default runs maintenance and gc tasks in the background. This is because maintenance and gc tasks, especially in large repositories, can take a long time to run. Therefore, they _should_ be as nonintrusive as possible to other Git commands, especially porcelain ones. However, this is not the case as discovered earlier[1] -- certain maintenance and gc tasks are not safe to run in parallel with other commands. The consequences of such are that scripts with commands that trigger maintenance/gc can race and crash. Users can also run into unexpected errors from porcelain commands that touch common files such as HEAD.lock, unaware that a background maintenance/gc task is the one holding the lock. As Patrick points out[2], the two unsafe commands are `git reflog expire --all`, invoked by gc, and `git pack-refs --all --prune`, invoked by maintenance. We can create two buckets for subtasks -- one for async safe tasks and one for async unsafe tasks. When `[maintenance, gc].autoDetach` is not set or set to true, maintenance will run the unsafe tasks first before detaching to run the safe tasks. This series is in RFC to see if the general direction of the patch is going in the right direction. I left a couple of WIPs in the first patch documenting what still needs to be done if the direction is palatable. [1] https://lore.kernel.org/git/CAFySSZBCKUiY5DO3fz340a0dTb0zUDNKxaTYU0LAqsBD2RMwSg@mail.gmail.com/ [2] https://lore.kernel.org/git/ZxeilMDwq0Z3krhz@pks.im Calvin Wan (1): maintenance: separate parallelism safe and unsafe tasks builtin/gc.c | 173 ++++++++++++++++++++++++++++++++++++----- t/t7900-maintenance.sh | 24 +++--- 2 files changed, 168 insertions(+), 29 deletions(-) -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-08 17:31 [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks Calvin Wan @ 2024-11-08 17:31 ` Calvin Wan 2024-11-11 7:07 ` Patrick Steinhardt ` (2 more replies) 2024-11-11 4:50 ` [RFC PATCH 0/1] " Junio C Hamano 1 sibling, 3 replies; 13+ messages in thread From: Calvin Wan @ 2024-11-08 17:31 UTC (permalink / raw) To: git; +Cc: Calvin Wan, steamdon, emrass, ps, me, stolee Certain maintenance tasks and subtasks within gc are unsafe to run in parallel with other commands because they lock up files such as HEAD. Therefore, tasks are marked whether they are async safe or not. Async unsafe tasks are run first in the same process before running async safe tasks in parallel. Since the gc task is partially safe, there are two new tasks -- an async safe gc task and an async unsafe gc task. In order to properly invoke this in gc, `--run-async-safe` and `--run-async-unsafe` have been added as options to gc. Maintenance will only run these two new tasks if it was set to detach, otherwise the original gc task runs. Additionally, if a user passes in tasks thru `--task`, we do not attempt to run separate async/sync tasks since the user sets the order of tasks. WIP: automatically run gc unsafe tasks when gc is invoked but not from maintenance WIP: edit test in t7900-maintainance.sh to match new functionality WIP: add additional documentation for new options and functionality Signed-off-by: Calvin Wan <calvinwan@google.com> --- builtin/gc.c | 173 ++++++++++++++++++++++++++++++++++++----- t/t7900-maintenance.sh | 24 +++--- 2 files changed, 168 insertions(+), 29 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index d52735354c..375d304c42 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -668,6 +668,8 @@ struct repository *repo UNUSED) pid_t pid; int daemonized = 0; int keep_largest_pack = -1; + int run_async_safe = 0; + int run_async_unsafe = 0; timestamp_t dummy; struct child_process rerere_cmd = CHILD_PROCESS_INIT; struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT; @@ -694,6 +696,10 @@ struct repository *repo UNUSED) PARSE_OPT_NOCOMPLETE), OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack, N_("repack all other packs except the largest pack")), + OPT_BOOL(0, "run-async-safe", &run_async_safe, + N_("run only background safe gc tasks, should only be invoked thru maintenance")), + OPT_BOOL(0, "run-async-unsafe", &run_async_unsafe, + N_("run only background unsafe gc tasks, should only be invoked thru maintenance")), OPT_END() }; @@ -718,6 +724,9 @@ struct repository *repo UNUSED) builtin_gc_usage, 0); if (argc > 0) usage_with_options(builtin_gc_usage, builtin_gc_options); + + if (run_async_safe && run_async_unsafe) + die(_("--run-async-safe cannot be used with --run-async-unsafe")); if (prune_expire_arg != prune_expire_sentinel) { free(cfg.prune_expire); @@ -815,7 +824,12 @@ struct repository *repo UNUSED) atexit(process_log_file_at_exit); } - gc_before_repack(&opts, &cfg); + if (run_async_unsafe) { + gc_before_repack(&opts, &cfg); + goto out; + } else if (!run_async_safe) + gc_before_repack(&opts, &cfg); + if (!repository_format_precious_objects) { struct child_process repack_cmd = CHILD_PROCESS_INIT; @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts, return 0; } +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts, + struct gc_config *cfg UNUSED) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = child.close_object_store = 1; + strvec_push(&child.args, "gc"); + + if (opts->auto_flag) + strvec_push(&child.args, "--auto"); + if (opts->quiet) + strvec_push(&child.args, "--quiet"); + else + strvec_push(&child.args, "--no-quiet"); + strvec_push(&child.args, "--no-detach"); + strvec_push(&child.args, "--run-async-unsafe"); + + return run_command(&child); +} + +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts, + struct gc_config *cfg UNUSED) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = child.close_object_store = 1; + strvec_push(&child.args, "gc"); + + if (opts->auto_flag) + strvec_push(&child.args, "--auto"); + if (opts->quiet) + strvec_push(&child.args, "--quiet"); + else + strvec_push(&child.args, "--no-quiet"); + strvec_push(&child.args, "--no-detach"); + strvec_push(&child.args, "--run-async-safe"); + + return run_command(&child); +} + static int maintenance_task_gc(struct maintenance_run_opts *opts, struct gc_config *cfg UNUSED) { @@ -1350,6 +1404,7 @@ struct maintenance_task { const char *name; maintenance_task_fn *fn; maintenance_auto_fn *auto_condition; + unsigned daemonize_safe; unsigned enabled:1; enum schedule_priority schedule; @@ -1362,6 +1417,8 @@ enum maintenance_task_label { TASK_PREFETCH, TASK_LOOSE_OBJECTS, TASK_INCREMENTAL_REPACK, + TASK_UNSAFE_GC, + TASK_SAFE_GC, TASK_GC, TASK_COMMIT_GRAPH, TASK_PACK_REFS, @@ -1370,36 +1427,62 @@ enum maintenance_task_label { TASK__COUNT }; +enum maintenance_task_daemonize_safe { + UNSAFE, + SAFE, +}; + static struct maintenance_task tasks[] = { [TASK_PREFETCH] = { "prefetch", maintenance_task_prefetch, + NULL, + SAFE, }, [TASK_LOOSE_OBJECTS] = { "loose-objects", maintenance_task_loose_objects, loose_object_auto_condition, + SAFE, }, [TASK_INCREMENTAL_REPACK] = { "incremental-repack", maintenance_task_incremental_repack, incremental_repack_auto_condition, + SAFE, + }, + [TASK_UNSAFE_GC] = { + "unsafe-gc", + maintenance_task_unsafe_gc, + need_to_gc, + UNSAFE, + 0, + }, + [TASK_SAFE_GC] = { + "safe-gc", + maintenance_task_safe_gc, + need_to_gc, + SAFE, + 0, }, [TASK_GC] = { "gc", maintenance_task_gc, need_to_gc, + UNSAFE, 1, }, [TASK_COMMIT_GRAPH] = { "commit-graph", maintenance_task_commit_graph, should_write_commit_graph, + SAFE, }, [TASK_PACK_REFS] = { "pack-refs", maintenance_task_pack_refs, pack_refs_condition, + UNSAFE, }, }; @@ -1411,10 +1494,18 @@ static int compare_tasks_by_selection(const void *a_, const void *b_) return b->selected_order - a->selected_order; } +static int compare_tasks_by_safeness(const void *a_, const void *b_) +{ + const struct maintenance_task *a = a_; + const struct maintenance_task *b = b_; + + return a->daemonize_safe - b->daemonize_safe; +} + static int maintenance_run_tasks(struct maintenance_run_opts *opts, struct gc_config *cfg) { - int i, found_selected = 0; + int i, j, found_selected = 0; int result = 0; struct lock_file lk; struct repository *r = the_repository; @@ -1436,6 +1527,57 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts, } free(lock_path); + for (i = 0; !found_selected && i < TASK__COUNT; i++) + found_selected = tasks[i].selected_order >= 0; + + if (found_selected) + QSORT(tasks, TASK__COUNT, compare_tasks_by_selection); + else if (opts->detach > 0) { + /* + * Part of gc is unsafe to run in the background, so + * separate out gc into unsafe and safe tasks. + * + * Run unsafe tasks, including unsafe maintenance tasks, + * before daemonizing and running safe tasks. + */ + + if (tasks[TASK_GC].enabled) { + tasks[TASK_GC].enabled = 0; + tasks[TASK_UNSAFE_GC].enabled = 1; + tasks[TASK_UNSAFE_GC].schedule = tasks[TASK_GC].schedule; + tasks[TASK_SAFE_GC].enabled = 1; + tasks[TASK_SAFE_GC].schedule = tasks[TASK_GC].schedule; + } + + QSORT(tasks, TASK__COUNT, compare_tasks_by_safeness); + + for (j = 0; j < TASK__COUNT; j++) { + if (tasks[j].daemonize_safe == SAFE) + break; + + if (found_selected && tasks[j].selected_order < 0) + continue; + + if (!found_selected && !tasks[j].enabled) + continue; + + if (opts->auto_flag && + (!tasks[j].auto_condition || + !tasks[j].auto_condition(cfg))) + continue; + + if (opts->schedule && tasks[j].schedule < opts->schedule) + continue; + + trace2_region_enter("maintenance", tasks[j].name, r); + if (tasks[j].fn(opts, cfg)) { + error(_("task '%s' failed"), tasks[j].name); + result = 1; + } + trace2_region_leave("maintenance", tasks[j].name, r); + } + } + /* Failure to daemonize is ok, we'll continue in foreground. */ if (opts->detach > 0) { trace2_region_enter("maintenance", "detach", the_repository); @@ -1443,33 +1585,28 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts, trace2_region_leave("maintenance", "detach", the_repository); } - for (i = 0; !found_selected && i < TASK__COUNT; i++) - found_selected = tasks[i].selected_order >= 0; - - if (found_selected) - QSORT(tasks, TASK__COUNT, compare_tasks_by_selection); - - for (i = 0; i < TASK__COUNT; i++) { - if (found_selected && tasks[i].selected_order < 0) + for (j = j; j < TASK__COUNT; j++) { + if (found_selected && tasks[j].selected_order < 0) continue; - if (!found_selected && !tasks[i].enabled) + if (!found_selected && !tasks[j].enabled) continue; if (opts->auto_flag && - (!tasks[i].auto_condition || - !tasks[i].auto_condition(cfg))) + (!tasks[j].auto_condition || + !tasks[j].auto_condition(cfg))) continue; - if (opts->schedule && tasks[i].schedule < opts->schedule) + if (opts->schedule && tasks[j].schedule < opts->schedule) continue; - trace2_region_enter("maintenance", tasks[i].name, r); - if (tasks[i].fn(opts, cfg)) { - error(_("task '%s' failed"), tasks[i].name); + // fprintf(stderr, "running %i: %s\n",j , tasks[j].name); + trace2_region_enter("maintenance", tasks[j].name, r); + if (tasks[j].fn(opts, cfg)) { + error(_("task '%s' failed"), tasks[j].name); result = 1; } - trace2_region_leave("maintenance", tasks[i].name, r); + trace2_region_leave("maintenance", tasks[j].name, r); } rollback_lock_file(&lk); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index c224c8450c..5bbd07ec30 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -43,17 +43,19 @@ test_expect_success 'help text' ' test_grep "usage: git maintenance" err ' -test_expect_success 'run [--auto|--quiet]' ' - GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \ - git maintenance run 2>/dev/null && - GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \ - git maintenance run --auto 2>/dev/null && - GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \ - git maintenance run --no-quiet 2>/dev/null && - test_subcommand git gc --quiet --no-detach <run-no-auto.txt && - test_subcommand ! git gc --auto --quiet --no-detach <run-auto.txt && - test_subcommand git gc --no-quiet --no-detach <run-no-quiet.txt -' +# This test fails with this series since the gc call is now split up so the traces won't match exactly + +# test_expect_success 'run [--auto|--quiet]' ' +# GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \ +# git maintenance run 2>/dev/null && +# GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \ +# git maintenance run --auto 2>/dev/null && +# GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \ +# git maintenance run --no-quiet 2>/dev/null && +# test_subcommand git gc --quiet --no-detach <run-no-auto.txt && +# test_subcommand ! git gc --auto --quiet --no-detach <run-auto.txt && +# test_subcommand git gc --no-quiet --no-detach <run-no-quiet.txt +# ' test_expect_success 'maintenance.auto config option' ' GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 && -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan @ 2024-11-11 7:07 ` Patrick Steinhardt 2024-11-11 18:06 ` Calvin Wan 2024-11-11 8:24 ` Junio C Hamano 2024-11-11 9:14 ` Junio C Hamano 2 siblings, 1 reply; 13+ messages in thread From: Patrick Steinhardt @ 2024-11-11 7:07 UTC (permalink / raw) To: Calvin Wan; +Cc: git, steamdon, emrass, me, stolee On Fri, Nov 08, 2024 at 05:31:12PM +0000, Calvin Wan wrote: > Certain maintenance tasks and subtasks within gc are unsafe to run in > parallel with other commands because they lock up files such as > HEAD. I don't think it is fair to classify this as "unsafe". Nothing is unsafe here: we take locks to guard us against concurrent modifications. What you're having problems with is the fact that this safety mechanism works as expected and keeps other processes from modifying locked the data. > Therefore, tasks are marked whether they are async safe or > not. Async unsafe tasks are run first in the same process before running > async safe tasks in parallel. > > Since the gc task is partially safe, there are two new tasks -- an async > safe gc task and an async unsafe gc task. In order to properly invoke > this in gc, `--run-async-safe` and `--run-async-unsafe` have been added > as options to gc. Maintenance will only run these two new tasks if it > was set to detach, otherwise the original gc task runs. > > Additionally, if a user passes in tasks thru `--task`, we do not attempt > to run separate async/sync tasks since the user sets the order of tasks. > > WIP: automatically run gc unsafe tasks when gc is invoked but not from > maintenance > WIP: edit test in t7900-maintainance.sh to match new functionality > WIP: add additional documentation for new options and functionality > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > builtin/gc.c | 173 ++++++++++++++++++++++++++++++++++++----- > t/t7900-maintenance.sh | 24 +++--- > 2 files changed, 168 insertions(+), 29 deletions(-) > > diff --git a/builtin/gc.c b/builtin/gc.c > index d52735354c..375d304c42 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c It might make sense to split out the git-gc(1) changes into a preparatory commit with its own set of tests. > @@ -815,7 +824,12 @@ struct repository *repo UNUSED) > atexit(process_log_file_at_exit); > } > > - gc_before_repack(&opts, &cfg); > + if (run_async_unsafe) { > + gc_before_repack(&opts, &cfg); > + goto out; > + } else if (!run_async_safe) > + gc_before_repack(&opts, &cfg); > + > Style: there should be curly braces around the `else if` here. > if (!repository_format_precious_objects) { > struct child_process repack_cmd = CHILD_PROCESS_INIT; > @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts, > return 0; > } > > +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts, > + struct gc_config *cfg UNUSED) > +{ > + struct child_process child = CHILD_PROCESS_INIT; > + > + child.git_cmd = child.close_object_store = 1; > + strvec_push(&child.args, "gc"); > + > + if (opts->auto_flag) > + strvec_push(&child.args, "--auto"); > + if (opts->quiet) > + strvec_push(&child.args, "--quiet"); > + else > + strvec_push(&child.args, "--no-quiet"); > + strvec_push(&child.args, "--no-detach"); > + strvec_push(&child.args, "--run-async-unsafe"); > + > + return run_command(&child); > +} > + > +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts, > + struct gc_config *cfg UNUSED) > +{ > + struct child_process child = CHILD_PROCESS_INIT; > + > + child.git_cmd = child.close_object_store = 1; > + strvec_push(&child.args, "gc"); > + > + if (opts->auto_flag) > + strvec_push(&child.args, "--auto"); > + if (opts->quiet) > + strvec_push(&child.args, "--quiet"); > + else > + strvec_push(&child.args, "--no-quiet"); > + strvec_push(&child.args, "--no-detach"); > + strvec_push(&child.args, "--run-async-safe"); > + > + return run_command(&child); > +} These two functions and `maintenance_task_gc()` all look exactly the same. We should deduplicate them. > static int maintenance_task_gc(struct maintenance_run_opts *opts, > struct gc_config *cfg UNUSED) > { > @@ -1350,6 +1404,7 @@ struct maintenance_task { > const char *name; > maintenance_task_fn *fn; > maintenance_auto_fn *auto_condition; > + unsigned daemonize_safe; We can use the enum here to give readers a better hint what this variable is about. > unsigned enabled:1; > > enum schedule_priority schedule; > @@ -1362,6 +1417,8 @@ enum maintenance_task_label { > TASK_PREFETCH, > TASK_LOOSE_OBJECTS, > TASK_INCREMENTAL_REPACK, > + TASK_UNSAFE_GC, > + TASK_SAFE_GC, > TASK_GC, > TASK_COMMIT_GRAPH, > TASK_PACK_REFS, > @@ -1370,36 +1427,62 @@ enum maintenance_task_label { > TASK__COUNT > }; > > +enum maintenance_task_daemonize_safe { > + UNSAFE, > + SAFE, > +}; These names can conflict quite fast. Do we maybe want to rename them to e.g. `MAINTENANCE_TASK_DAEMONIZE_(SAFE|UNSAFE)`? > static struct maintenance_task tasks[] = { > [TASK_PREFETCH] = { > "prefetch", > maintenance_task_prefetch, > + NULL, > + SAFE, > }, It might make sense to prepare these to take designated field initializers in a preparatory commit. > [TASK_LOOSE_OBJECTS] = { > "loose-objects", > maintenance_task_loose_objects, > loose_object_auto_condition, > + SAFE, > }, > [TASK_INCREMENTAL_REPACK] = { > "incremental-repack", > maintenance_task_incremental_repack, > incremental_repack_auto_condition, > + SAFE, > + }, > + [TASK_UNSAFE_GC] = { > + "unsafe-gc", > + maintenance_task_unsafe_gc, > + need_to_gc, > + UNSAFE, > + 0, > + }, > + [TASK_SAFE_GC] = { > + "safe-gc", > + maintenance_task_safe_gc, > + need_to_gc, > + SAFE, > + 0, > }, Hm. I wonder whether we really want to expose additional tasks to address the issue, which feels like we're leaking implementation details to our users. Would it maybe be preferable to instead introduce a new optional callback function for every task that handles the pre-detach logic? I wonder whether we also have to adapt the "pack-refs" task to be synchronous instead of asynchronous? > @@ -1436,6 +1527,57 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts, > } > free(lock_path); > > + for (i = 0; !found_selected && i < TASK__COUNT; i++) > + found_selected = tasks[i].selected_order >= 0; > + > + if (found_selected) > + QSORT(tasks, TASK__COUNT, compare_tasks_by_selection); > + else if (opts->detach > 0) { > + /* > + * Part of gc is unsafe to run in the background, so > + * separate out gc into unsafe and safe tasks. > + * > + * Run unsafe tasks, including unsafe maintenance tasks, > + * before daemonizing and running safe tasks. > + */ > + > + if (tasks[TASK_GC].enabled) { > + tasks[TASK_GC].enabled = 0; > + tasks[TASK_UNSAFE_GC].enabled = 1; > + tasks[TASK_UNSAFE_GC].schedule = tasks[TASK_GC].schedule; > + tasks[TASK_SAFE_GC].enabled = 1; > + tasks[TASK_SAFE_GC].schedule = tasks[TASK_GC].schedule; > + } If we did the above, then we could also get rid of the complexity here. Thanks! Patrick ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-11 7:07 ` Patrick Steinhardt @ 2024-11-11 18:06 ` Calvin Wan 2024-11-12 6:28 ` Patrick Steinhardt 0 siblings, 1 reply; 13+ messages in thread From: Calvin Wan @ 2024-11-11 18:06 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, steamdon, emrass, me, stolee On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Fri, Nov 08, 2024 at 05:31:12PM +0000, Calvin Wan wrote: > > Certain maintenance tasks and subtasks within gc are unsafe to run in > > parallel with other commands because they lock up files such as > > HEAD. > > I don't think it is fair to classify this as "unsafe". Nothing is unsafe > here: we take locks to guard us against concurrent modifications. > What you're having problems with is the fact that this safety mechanism > works as expected and keeps other processes from modifying locked the > data. > > > Therefore, tasks are marked whether they are async safe or > > not. Async unsafe tasks are run first in the same process before running > > async safe tasks in parallel. > > > > Since the gc task is partially safe, there are two new tasks -- an async > > safe gc task and an async unsafe gc task. In order to properly invoke > > this in gc, `--run-async-safe` and `--run-async-unsafe` have been added > > as options to gc. Maintenance will only run these two new tasks if it > > was set to detach, otherwise the original gc task runs. > > > > Additionally, if a user passes in tasks thru `--task`, we do not attempt > > to run separate async/sync tasks since the user sets the order of tasks. > > > > WIP: automatically run gc unsafe tasks when gc is invoked but not from > > maintenance > > WIP: edit test in t7900-maintainance.sh to match new functionality > > WIP: add additional documentation for new options and functionality > > > > Signed-off-by: Calvin Wan <calvinwan@google.com> > > --- > > builtin/gc.c | 173 ++++++++++++++++++++++++++++++++++++----- > > t/t7900-maintenance.sh | 24 +++--- > > 2 files changed, 168 insertions(+), 29 deletions(-) > > > > diff --git a/builtin/gc.c b/builtin/gc.c > > index d52735354c..375d304c42 100644 > > --- a/builtin/gc.c > > +++ b/builtin/gc.c > > It might make sense to split out the git-gc(1) changes into a > preparatory commit with its own set of tests. > > > @@ -815,7 +824,12 @@ struct repository *repo UNUSED) > > atexit(process_log_file_at_exit); > > } > > > > - gc_before_repack(&opts, &cfg); > > + if (run_async_unsafe) { > > + gc_before_repack(&opts, &cfg); > > + goto out; > > + } else if (!run_async_safe) > > + gc_before_repack(&opts, &cfg); > > + > > > > Style: there should be curly braces around the `else if` here. > > > if (!repository_format_precious_objects) { > > struct child_process repack_cmd = CHILD_PROCESS_INIT; > > @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts, > > return 0; > > } > > > > +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts, > > + struct gc_config *cfg UNUSED) > > +{ > > + struct child_process child = CHILD_PROCESS_INIT; > > + > > + child.git_cmd = child.close_object_store = 1; > > + strvec_push(&child.args, "gc"); > > + > > + if (opts->auto_flag) > > + strvec_push(&child.args, "--auto"); > > + if (opts->quiet) > > + strvec_push(&child.args, "--quiet"); > > + else > > + strvec_push(&child.args, "--no-quiet"); > > + strvec_push(&child.args, "--no-detach"); > > + strvec_push(&child.args, "--run-async-unsafe"); > > + > > + return run_command(&child); > > +} > > + > > +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts, > > + struct gc_config *cfg UNUSED) > > +{ > > + struct child_process child = CHILD_PROCESS_INIT; > > + > > + child.git_cmd = child.close_object_store = 1; > > + strvec_push(&child.args, "gc"); > > + > > + if (opts->auto_flag) > > + strvec_push(&child.args, "--auto"); > > + if (opts->quiet) > > + strvec_push(&child.args, "--quiet"); > > + else > > + strvec_push(&child.args, "--no-quiet"); > > + strvec_push(&child.args, "--no-detach"); > > + strvec_push(&child.args, "--run-async-safe"); > > + > > + return run_command(&child); > > +} > > These two functions and `maintenance_task_gc()` all look exactly the > same. We should deduplicate them. > > > static int maintenance_task_gc(struct maintenance_run_opts *opts, > > struct gc_config *cfg UNUSED) > > { > > @@ -1350,6 +1404,7 @@ struct maintenance_task { > > const char *name; > > maintenance_task_fn *fn; > > maintenance_auto_fn *auto_condition; > > + unsigned daemonize_safe; > > We can use the enum here to give readers a better hint what this > variable is about. > > > unsigned enabled:1; > > > > enum schedule_priority schedule; > > @@ -1362,6 +1417,8 @@ enum maintenance_task_label { > > TASK_PREFETCH, > > TASK_LOOSE_OBJECTS, > > TASK_INCREMENTAL_REPACK, > > + TASK_UNSAFE_GC, > > + TASK_SAFE_GC, > > TASK_GC, > > TASK_COMMIT_GRAPH, > > TASK_PACK_REFS, > > @@ -1370,36 +1427,62 @@ enum maintenance_task_label { > > TASK__COUNT > > }; > > > > +enum maintenance_task_daemonize_safe { > > + UNSAFE, > > + SAFE, > > +}; > > These names can conflict quite fast. Do we maybe want to rename them to > e.g. `MAINTENANCE_TASK_DAEMONIZE_(SAFE|UNSAFE)`? > > > static struct maintenance_task tasks[] = { > > [TASK_PREFETCH] = { > > "prefetch", > > maintenance_task_prefetch, > > + NULL, > > + SAFE, > > }, > > It might make sense to prepare these to take designated field > initializers in a preparatory commit. Thanks for all the stylistic feedback. I agree much of this can be cleaned up to be simpler, but I sent this as an RFC to gather feedback on whether this patch directionally made sense. Will clean everything up in the v1. > > > [TASK_LOOSE_OBJECTS] = { > > "loose-objects", > > maintenance_task_loose_objects, > > loose_object_auto_condition, > > + SAFE, > > }, > > [TASK_INCREMENTAL_REPACK] = { > > "incremental-repack", > > maintenance_task_incremental_repack, > > incremental_repack_auto_condition, > > + SAFE, > > + }, > > + [TASK_UNSAFE_GC] = { > > + "unsafe-gc", > > + maintenance_task_unsafe_gc, > > + need_to_gc, > > + UNSAFE, > > + 0, > > + }, > > + [TASK_SAFE_GC] = { > > + "safe-gc", > > + maintenance_task_safe_gc, > > + need_to_gc, > > + SAFE, > > + 0, > > }, > > Hm. I wonder whether we really want to expose additional tasks to > address the issue, which feels like we're leaking implementation details > to our users. Would it maybe be preferable to instead introduce a new > optional callback function for every task that handles the pre-detach > logic? This does sound like a good idea. However, would there be any issue with running all pre-detach logic before running post-detach logic? I'm thinking if pre-detach logic from a different function could affect post-detach logic from another. If not, I do agree this would be the best solution going forward. > I wonder whether we also have to adapt the "pack-refs" task to be > synchronous instead of asynchronous? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-11 18:06 ` Calvin Wan @ 2024-11-12 6:28 ` Patrick Steinhardt 2024-11-15 20:13 ` Calvin Wan 0 siblings, 1 reply; 13+ messages in thread From: Patrick Steinhardt @ 2024-11-12 6:28 UTC (permalink / raw) To: Calvin Wan; +Cc: git, steamdon, emrass, me, stolee On Mon, Nov 11, 2024 at 10:06:10AM -0800, Calvin Wan wrote: > On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@pks.im> wrote: > > > [TASK_LOOSE_OBJECTS] = { > > > "loose-objects", > > > maintenance_task_loose_objects, > > > loose_object_auto_condition, > > > + SAFE, > > > }, > > > [TASK_INCREMENTAL_REPACK] = { > > > "incremental-repack", > > > maintenance_task_incremental_repack, > > > incremental_repack_auto_condition, > > > + SAFE, > > > + }, > > > + [TASK_UNSAFE_GC] = { > > > + "unsafe-gc", > > > + maintenance_task_unsafe_gc, > > > + need_to_gc, > > > + UNSAFE, > > > + 0, > > > + }, > > > + [TASK_SAFE_GC] = { > > > + "safe-gc", > > > + maintenance_task_safe_gc, > > > + need_to_gc, > > > + SAFE, > > > + 0, > > > }, > > > > Hm. I wonder whether we really want to expose additional tasks to > > address the issue, which feels like we're leaking implementation details > > to our users. Would it maybe be preferable to instead introduce a new > > optional callback function for every task that handles the pre-detach > > logic? > > This does sound like a good idea. However, would there be any issue > with running all pre-detach logic before running post-detach logic? > I'm thinking if pre-detach logic from a different function could > affect post-detach logic from another. If not, I do agree this would > be the best solution going forward. Sure, in theory these can interact with each other. But is that any different when you represent this with tasks instead? The conflict would still exist there. It's also not any different to how things work right now: the "gc" task will impact the "repack" task, so configuring them both at the same time does not really make much sense. Patrick ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-12 6:28 ` Patrick Steinhardt @ 2024-11-15 20:13 ` Calvin Wan 2024-11-18 1:32 ` Junio C Hamano 2024-11-18 6:58 ` Patrick Steinhardt 0 siblings, 2 replies; 13+ messages in thread From: Calvin Wan @ 2024-11-15 20:13 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, emrass, me, stolee, Josh Steadmon On Mon, Nov 11, 2024 at 10:29 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Mon, Nov 11, 2024 at 10:06:10AM -0800, Calvin Wan wrote: > > On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@pks.im> wrote: > > > > > > Hm. I wonder whether we really want to expose additional tasks to > > > address the issue, which feels like we're leaking implementation details > > > to our users. Would it maybe be preferable to instead introduce a new > > > optional callback function for every task that handles the pre-detach > > > logic? > > > > This does sound like a good idea. However, would there be any issue > > with running all pre-detach logic before running post-detach logic? > > I'm thinking if pre-detach logic from a different function could > > affect post-detach logic from another. If not, I do agree this would > > be the best solution going forward. > > Sure, in theory these can interact with each other. But is that any > different when you represent this with tasks instead? The conflict would > still exist there. It's also not any different to how things work right > now: the "gc" task will impact the "repack" task, so configuring them > both at the same time does not really make much sense. No you are correct that this is no different than how these tasks are currently run. However, I have just received some numbers that the repack, when gc'ing in Android, is the longest operation so even if we were able to run repack first in the foreground, ultimately it wouldn't save a significant amount of time compared to running gc entirely in the foreground. I think for now it makes sense to hold off on rerolling this series (at least in the form of auto backgrounding/foregrounding tasks) since the purported benefits currently aren't worth the churn. Thanks again for the comments on this series ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-15 20:13 ` Calvin Wan @ 2024-11-18 1:32 ` Junio C Hamano 2024-11-18 6:58 ` Patrick Steinhardt 1 sibling, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2024-11-18 1:32 UTC (permalink / raw) To: Calvin Wan; +Cc: Patrick Steinhardt, git, emrass, me, stolee, Josh Steadmon Calvin Wan <calvinwan@google.com> writes: > .... I think for now it makes sense to hold off > on rerolling this series (at least in the form of auto > backgrounding/foregrounding tasks) since the purported benefits > currently aren't worth the churn. Thanks again for the comments on > this series So, we'll place this topic on hold, or even eject from the tree to be revisited later? Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-15 20:13 ` Calvin Wan 2024-11-18 1:32 ` Junio C Hamano @ 2024-11-18 6:58 ` Patrick Steinhardt 1 sibling, 0 replies; 13+ messages in thread From: Patrick Steinhardt @ 2024-11-18 6:58 UTC (permalink / raw) To: Calvin Wan; +Cc: git, emrass, me, stolee, Josh Steadmon On Fri, Nov 15, 2024 at 12:13:24PM -0800, Calvin Wan wrote: > On Mon, Nov 11, 2024 at 10:29 PM Patrick Steinhardt <ps@pks.im> wrote: > > > > On Mon, Nov 11, 2024 at 10:06:10AM -0800, Calvin Wan wrote: > > > On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@pks.im> wrote: > > > > > > > > Hm. I wonder whether we really want to expose additional tasks to > > > > address the issue, which feels like we're leaking implementation details > > > > to our users. Would it maybe be preferable to instead introduce a new > > > > optional callback function for every task that handles the pre-detach > > > > logic? > > > > > > This does sound like a good idea. However, would there be any issue > > > with running all pre-detach logic before running post-detach logic? > > > I'm thinking if pre-detach logic from a different function could > > > affect post-detach logic from another. If not, I do agree this would > > > be the best solution going forward. > > > > Sure, in theory these can interact with each other. But is that any > > different when you represent this with tasks instead? The conflict would > > still exist there. It's also not any different to how things work right > > now: the "gc" task will impact the "repack" task, so configuring them > > both at the same time does not really make much sense. > > No you are correct that this is no different than how these tasks are > currently run. However, I have just received some numbers that the > repack, when gc'ing in Android, is the longest operation so even if we > were able to run repack first in the foreground, ultimately it > wouldn't save a significant amount of time compared to running gc > entirely in the foreground. I think for now it makes sense to hold off > on rerolling this series (at least in the form of auto > backgrounding/foregrounding tasks) since the purported benefits > currently aren't worth the churn. Thanks again for the comments on > this series Wait, I think I'm missing something. Why should the repack run in the foreground? Based on your report the only thing that'd have to run in the foreground are tasks that lock refs, so `git pack-refs` and `git reflog expire`. Patrick ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan 2024-11-11 7:07 ` Patrick Steinhardt @ 2024-11-11 8:24 ` Junio C Hamano 2024-11-11 9:14 ` Junio C Hamano 2 siblings, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2024-11-11 8:24 UTC (permalink / raw) To: Calvin Wan; +Cc: git, steamdon, emrass, ps, me, stolee Calvin Wan <calvinwan@google.com> writes: > + for (j = j; j < TASK__COUNT; j++) { This seems to break the build. Here is what I got from my compiler. builtin/gc.c:1588:9: error: explicitly assigning value of variable of type 'int' to itself [-Werror,-Wself-assign] for (j = j; j < TASK__COUNT; j++) { ~ ^ ~ builtin/gc.c:1535:11: error: variable 'j' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] else if (opts->detach > 0) { ^~~~~~~~~~~~~~~~ builtin/gc.c:1588:11: note: uninitialized use occurs here for (j = j; j < TASK__COUNT; j++) { ^ builtin/gc.c:1535:7: note: remove the 'if' if its condition is always true else if (opts->detach > 0) { ^~~~~~~~~~~~~~~~~~~~~~ builtin/gc.c:1533:6: error: variable 'j' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (found_selected) ^~~~~~~~~~~~~~ builtin/gc.c:1588:11: note: uninitialized use occurs here for (j = j; j < TASK__COUNT; j++) { ^ builtin/gc.c:1533:2: note: remove the 'if' if its condition is always false if (found_selected) ^~~~~~~~~~~~~~~~~~~ builtin/gc.c:1508:10: note: initialize the variable 'j' to silence this warning int i, j, found_selected = 0; ^ = 0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan 2024-11-11 7:07 ` Patrick Steinhardt 2024-11-11 8:24 ` Junio C Hamano @ 2024-11-11 9:14 ` Junio C Hamano 2024-11-11 18:12 ` Calvin Wan 2 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2024-11-11 9:14 UTC (permalink / raw) To: Calvin Wan; +Cc: git, steamdon, emrass, ps, me, stolee Calvin Wan <calvinwan@google.com> writes: > Since the gc task is partially safe, there are two new tasks -- an async > safe gc task and an async unsafe gc task. In order to properly invoke > this in gc, `--run-async-safe` and `--run-async-unsafe` have been added > as options to gc. Maintenance will only run these two new tasks if it > was set to detach, otherwise the original gc task runs. Would it essentially boil down to ensure that only one "maintenance" is running at a time, and when it is running the "unsafe" part, somehow the end-user MUST be made aware of that fact and told to refrain from touching the repository? > Additionally, if a user passes in tasks thru `--task`, we do not attempt > to run separate async/sync tasks since the user sets the order of tasks. In other words, the rope is long enough that the user can do whatever they want, regardless of what we think the order should be. Which probably makes sense. > diff --git a/builtin/gc.c b/builtin/gc.c > index d52735354c..375d304c42 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -668,6 +668,8 @@ struct repository *repo UNUSED) > pid_t pid; > int daemonized = 0; > int keep_largest_pack = -1; > + int run_async_safe = 0; > + int run_async_unsafe = 0; > timestamp_t dummy; > struct child_process rerere_cmd = CHILD_PROCESS_INIT; > struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT; > @@ -694,6 +696,10 @@ struct repository *repo UNUSED) > PARSE_OPT_NOCOMPLETE), > OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack, > N_("repack all other packs except the largest pack")), > + OPT_BOOL(0, "run-async-safe", &run_async_safe, > + N_("run only background safe gc tasks, should only be invoked thru maintenance")), > + OPT_BOOL(0, "run-async-unsafe", &run_async_unsafe, > + N_("run only background unsafe gc tasks, should only be invoked thru maintenance")), > OPT_END() > }; > > @@ -718,6 +724,9 @@ struct repository *repo UNUSED) > builtin_gc_usage, 0); > if (argc > 0) > usage_with_options(builtin_gc_usage, builtin_gc_options); > + > + if (run_async_safe && run_async_unsafe) > + die(_("--run-async-safe cannot be used with --run-async-unsafe")); So if the caller wants to eventually run both, it has to spawn this program twice, the first time with one option and the second time with the other option? Somehow it feels a bit unsatisfying. If the caller says "I want to run both classes", and if your safe/unsafe classification system knows which task belongs to which class and the classification system that unsafe ones should be run before safe ones (or whatever), wouldn't it be easier to use for the caller to be able to say "run both", and let your classification system take care of the ordering? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-11 9:14 ` Junio C Hamano @ 2024-11-11 18:12 ` Calvin Wan 0 siblings, 0 replies; 13+ messages in thread From: Calvin Wan @ 2024-11-11 18:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, emrass, ps, me, stolee, Josh Steadmon On Mon, Nov 11, 2024 at 1:15 AM Junio C Hamano <gitster@pobox.com> wrote: > > So if the caller wants to eventually run both, it has to spawn this > program twice, the first time with one option and the second time > with the other option? Somehow it feels a bit unsatisfying. If the > caller says "I want to run both classes", and if your safe/unsafe > classification system knows which task belongs to which class and > the classification system that unsafe ones should be run before safe > ones (or whatever), wouldn't it be easier to use for the caller to > be able to say "run both", and let your classification system take > care of the ordering? Yes this felt unsatisfactory to me as well for now. The issue is that, when invoked from maintenance, whether gc is detached or not is dependent on maintenance being detached or not. That is why in the commit message of the first patch, I left a "WIP: automatically run gc unsafe tasks when gc is invoked but not from maintenance". When I turn this into v1, however, I do intend on making this satisfactory. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-08 17:31 [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks Calvin Wan 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan @ 2024-11-11 4:50 ` Junio C Hamano 2024-11-11 18:39 ` Calvin Wan 1 sibling, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2024-11-11 4:50 UTC (permalink / raw) To: Calvin Wan; +Cc: git, steamdon, emrass, ps, me, stolee Calvin Wan <calvinwan@google.com> writes: > However, this is not the case as discovered earlier[1] -- certain > maintenance and gc tasks are not safe to run in parallel with other > commands. The consequences of such are that scripts with commands that > trigger maintenance/gc can race and crash. > > Users can also run into > unexpected errors from porcelain commands that touch common files such > as HEAD.lock, unaware that a background maintenance/gc task is the one > holding the lock. The symptom looks more like a controlled refusal of execution than a crash. > As Patrick points out[2], the two unsafe commands are `git reflog expire > --all`, invoked by gc, and `git pack-refs --all --prune`, invoked by > maintenance. We can create two buckets for subtasks -- one for async > safe tasks and one for async unsafe tasks. I am not sure if they can be partitioned into black and white, but let's see. > This series is in RFC to see if the general direction of the patch is > going in the right direction. I left a couple of WIPs in the first patch > documenting what still needs to be done if the direction is palatable. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks 2024-11-11 4:50 ` [RFC PATCH 0/1] " Junio C Hamano @ 2024-11-11 18:39 ` Calvin Wan 0 siblings, 0 replies; 13+ messages in thread From: Calvin Wan @ 2024-11-11 18:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, emrass, ps, me, stolee, Josh Steadmon On Sun, Nov 10, 2024 at 8:50 PM Junio C Hamano <gitster@pobox.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > However, this is not the case as discovered earlier[1] -- certain > > maintenance and gc tasks are not safe to run in parallel with other > > commands. The consequences of such are that scripts with commands that > > trigger maintenance/gc can race and crash. > > > > Users can also run into > > unexpected errors from porcelain commands that touch common files such > > as HEAD.lock, unaware that a background maintenance/gc task is the one > > holding the lock. > > The symptom looks more like a controlled refusal of execution than a > crash. Correct, I also do think some responsibility should be on the scripts to be able to both handle the "controlled refusal of execution" as well as checking to see if those lock files exist first to avoid such errors. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-18 6:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-08 17:31 [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks Calvin Wan 2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan 2024-11-11 7:07 ` Patrick Steinhardt 2024-11-11 18:06 ` Calvin Wan 2024-11-12 6:28 ` Patrick Steinhardt 2024-11-15 20:13 ` Calvin Wan 2024-11-18 1:32 ` Junio C Hamano 2024-11-18 6:58 ` Patrick Steinhardt 2024-11-11 8:24 ` Junio C Hamano 2024-11-11 9:14 ` Junio C Hamano 2024-11-11 18:12 ` Calvin Wan 2024-11-11 4:50 ` [RFC PATCH 0/1] " Junio C Hamano 2024-11-11 18:39 ` Calvin Wan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).