From: Calvin Wan <calvinwan@google.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, steamdon@google.com, emrass@google.com,
me@ttaylorr.com, stolee@gmail.com
Subject: Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks
Date: Mon, 11 Nov 2024 10:06:10 -0800 [thread overview]
Message-ID: <CAFySSZCzxfqpMWH5ORv8fYb7f5WU3Fc2N99fW33wD9JOcYVrVA@mail.gmail.com> (raw)
In-Reply-To: <ZzGtD4Jz9Wj6n0zH@pks.im>
On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, Nov 08, 2024 at 05:31:12PM +0000, Calvin Wan wrote:
> > Certain maintenance tasks and subtasks within gc are unsafe to run in
> > parallel with other commands because they lock up files such as
> > HEAD.
>
> I don't think it is fair to classify this as "unsafe". Nothing is unsafe
> here: we take locks to guard us against concurrent modifications.
> What you're having problems with is the fact that this safety mechanism
> works as expected and keeps other processes from modifying locked the
> data.
>
> > Therefore, tasks are marked whether they are async safe or
> > not. Async unsafe tasks are run first in the same process before running
> > async safe tasks in parallel.
> >
> > Since the gc task is partially safe, there are two new tasks -- an async
> > safe gc task and an async unsafe gc task. In order to properly invoke
> > this in gc, `--run-async-safe` and `--run-async-unsafe` have been added
> > as options to gc. Maintenance will only run these two new tasks if it
> > was set to detach, otherwise the original gc task runs.
> >
> > Additionally, if a user passes in tasks thru `--task`, we do not attempt
> > to run separate async/sync tasks since the user sets the order of tasks.
> >
> > WIP: automatically run gc unsafe tasks when gc is invoked but not from
> > maintenance
> > WIP: edit test in t7900-maintainance.sh to match new functionality
> > WIP: add additional documentation for new options and functionality
> >
> > Signed-off-by: Calvin Wan <calvinwan@google.com>
> > ---
> > builtin/gc.c | 173 ++++++++++++++++++++++++++++++++++++-----
> > t/t7900-maintenance.sh | 24 +++---
> > 2 files changed, 168 insertions(+), 29 deletions(-)
> >
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index d52735354c..375d304c42 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
>
> It might make sense to split out the git-gc(1) changes into a
> preparatory commit with its own set of tests.
>
> > @@ -815,7 +824,12 @@ struct repository *repo UNUSED)
> > atexit(process_log_file_at_exit);
> > }
> >
> > - gc_before_repack(&opts, &cfg);
> > + if (run_async_unsafe) {
> > + gc_before_repack(&opts, &cfg);
> > + goto out;
> > + } else if (!run_async_safe)
> > + gc_before_repack(&opts, &cfg);
> > +
> >
>
> Style: there should be curly braces around the `else if` here.
>
> > if (!repository_format_precious_objects) {
> > struct child_process repack_cmd = CHILD_PROCESS_INIT;
> > @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts,
> > return 0;
> > }
> >
> > +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts,
> > + struct gc_config *cfg UNUSED)
> > +{
> > + struct child_process child = CHILD_PROCESS_INIT;
> > +
> > + child.git_cmd = child.close_object_store = 1;
> > + strvec_push(&child.args, "gc");
> > +
> > + if (opts->auto_flag)
> > + strvec_push(&child.args, "--auto");
> > + if (opts->quiet)
> > + strvec_push(&child.args, "--quiet");
> > + else
> > + strvec_push(&child.args, "--no-quiet");
> > + strvec_push(&child.args, "--no-detach");
> > + strvec_push(&child.args, "--run-async-unsafe");
> > +
> > + return run_command(&child);
> > +}
> > +
> > +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts,
> > + struct gc_config *cfg UNUSED)
> > +{
> > + struct child_process child = CHILD_PROCESS_INIT;
> > +
> > + child.git_cmd = child.close_object_store = 1;
> > + strvec_push(&child.args, "gc");
> > +
> > + if (opts->auto_flag)
> > + strvec_push(&child.args, "--auto");
> > + if (opts->quiet)
> > + strvec_push(&child.args, "--quiet");
> > + else
> > + strvec_push(&child.args, "--no-quiet");
> > + strvec_push(&child.args, "--no-detach");
> > + strvec_push(&child.args, "--run-async-safe");
> > +
> > + return run_command(&child);
> > +}
>
> These two functions and `maintenance_task_gc()` all look exactly the
> same. We should deduplicate them.
>
> > static int maintenance_task_gc(struct maintenance_run_opts *opts,
> > struct gc_config *cfg UNUSED)
> > {
> > @@ -1350,6 +1404,7 @@ struct maintenance_task {
> > const char *name;
> > maintenance_task_fn *fn;
> > maintenance_auto_fn *auto_condition;
> > + unsigned daemonize_safe;
>
> We can use the enum here to give readers a better hint what this
> variable is about.
>
> > unsigned enabled:1;
> >
> > enum schedule_priority schedule;
> > @@ -1362,6 +1417,8 @@ enum maintenance_task_label {
> > TASK_PREFETCH,
> > TASK_LOOSE_OBJECTS,
> > TASK_INCREMENTAL_REPACK,
> > + TASK_UNSAFE_GC,
> > + TASK_SAFE_GC,
> > TASK_GC,
> > TASK_COMMIT_GRAPH,
> > TASK_PACK_REFS,
> > @@ -1370,36 +1427,62 @@ enum maintenance_task_label {
> > TASK__COUNT
> > };
> >
> > +enum maintenance_task_daemonize_safe {
> > + UNSAFE,
> > + SAFE,
> > +};
>
> These names can conflict quite fast. Do we maybe want to rename them to
> e.g. `MAINTENANCE_TASK_DAEMONIZE_(SAFE|UNSAFE)`?
>
> > static struct maintenance_task tasks[] = {
> > [TASK_PREFETCH] = {
> > "prefetch",
> > maintenance_task_prefetch,
> > + NULL,
> > + SAFE,
> > },
>
> It might make sense to prepare these to take designated field
> initializers in a preparatory commit.
Thanks for all the stylistic feedback. I agree much of this can be
cleaned up to be simpler, but I sent this as an RFC to gather feedback
on whether this patch directionally made sense. Will clean everything
up in the v1.
>
> > [TASK_LOOSE_OBJECTS] = {
> > "loose-objects",
> > maintenance_task_loose_objects,
> > loose_object_auto_condition,
> > + SAFE,
> > },
> > [TASK_INCREMENTAL_REPACK] = {
> > "incremental-repack",
> > maintenance_task_incremental_repack,
> > incremental_repack_auto_condition,
> > + SAFE,
> > + },
> > + [TASK_UNSAFE_GC] = {
> > + "unsafe-gc",
> > + maintenance_task_unsafe_gc,
> > + need_to_gc,
> > + UNSAFE,
> > + 0,
> > + },
> > + [TASK_SAFE_GC] = {
> > + "safe-gc",
> > + maintenance_task_safe_gc,
> > + need_to_gc,
> > + SAFE,
> > + 0,
> > },
>
> Hm. I wonder whether we really want to expose additional tasks to
> address the issue, which feels like we're leaking implementation details
> to our users. Would it maybe be preferable to instead introduce a new
> optional callback function for every task that handles the pre-detach
> logic?
This does sound like a good idea. However, would there be any issue
with running all pre-detach logic before running post-detach logic?
I'm thinking if pre-detach logic from a different function could
affect post-detach logic from another. If not, I do agree this would
be the best solution going forward.
> I wonder whether we also have to adapt the "pack-refs" task to be
> synchronous instead of asynchronous?
next prev parent reply other threads:[~2024-11-11 18:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 17:31 [RFC PATCH 0/1] maintenance: separate parallelism safe and unsafe tasks Calvin Wan
2024-11-08 17:31 ` [RFC PATCH 1/1] " Calvin Wan
2024-11-11 7:07 ` Patrick Steinhardt
2024-11-11 18:06 ` Calvin Wan [this message]
2024-11-12 6:28 ` Patrick Steinhardt
2024-11-15 20:13 ` Calvin Wan
2024-11-18 1:32 ` Junio C Hamano
2024-11-18 6:58 ` Patrick Steinhardt
2024-11-11 8:24 ` Junio C Hamano
2024-11-11 9:14 ` Junio C Hamano
2024-11-11 18:12 ` Calvin Wan
2024-11-11 4:50 ` [RFC PATCH 0/1] " Junio C Hamano
2024-11-11 18:39 ` Calvin Wan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFySSZCzxfqpMWH5ORv8fYb7f5WU3Fc2N99fW33wD9JOcYVrVA@mail.gmail.com \
--to=calvinwan@google.com \
--cc=emrass@google.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=ps@pks.im \
--cc=steamdon@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).