All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, peff@peff.net, jrnieder@google.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 00/15] [RFC] Maintenance jobs and job runner
Date: Wed, 27 May 2020 15:39:07 -0700	[thread overview]
Message-ID: <20200527223907.GB65111@google.com> (raw)
In-Reply-To: <20200408000149.GN6369@camp.crustytoothpaste.net>

I'm late to the discussion, but I'd like to chime in here too.


On 2020.04.08 00:01, brian m. carlson wrote:
> Hey,
> 
> On 2020-04-07 at 22:23:43, Johannes Schindelin wrote:
> > > If there are periodic tasks that should be done, even if only on large
> > > repos, then let's have a git gc --periodic that does them.  I'm not sure
> > > that fetch should be in that set, but nothing prevents users from doing
> > > "git fetch origin && git gc --periodic".
> > 
> > Hmm. Who says that maintenance tasks are essentially only `gc`? With
> > _maaaaaybe_ a `fetch` thrown in?
> 
> What I'm saying is that we have a tool to run maintenance tasks on the
> repository.  If we need to perform additional maintenance tasks, let's
> put them in the same place as the ones we have now.  I realize "gc" may
> become a less accurate name, but oh, well.
> 
> > > Let's make it as simple and straightforward as possible.
> > 
> > I get the impression, however, that many reviewers here seem to favor the
> > goal of making the _patches_ as simple and straightforward as possible,
> > however, at the expense of the original goal. Like, totally sacrificing
> > the ease of use in return for "just use a shell script" advice.
> 
> I think we can have both.  They are not mutually exclusive, and I've
> proposed a suggestion for both.
> 
> > > As for handling multiple repositories, the tool to do that could be as
> > > simple as a shell script which reads from ~/.config/git/repo-maintenance
> > > (or whatever) and runs the same command on all of the repos it finds
> > > there, possibly with a subcommand to add and remove repos.
> > 
> > Sure, that is flexible.
> > 
> > And it requires a ton of Git expertise to know what to put into those
> > scripts. And Git updates cannot deliver more value to those scripts.
> 
> Perhaps I was unclear what I thought could be the design of this.  My
> proposal is something like the following:
> 
>   git schedule-gc add [--period=TIME] [--fetch=REMOTE | --fetch-all] REPO
>   git schedule-gc remove REPO
> 
> The actual command invoked by the system scheduler would be something
> like the following:
> 
>   git schedule-gc run
> 
> It would work as I proposed under the hood, but it would be relatively
> straightforward to use.

Regardless of what happens with the job-runner, I would like to see a
top-level command that performs a single iteration of all the
recommended maintenance steps, with zero configuration required, on a
single repo. This gives an entry point for users who want to manage
their own maintenance schedule without running a background process.


> > > I'm not opposed to seeing a tool that can schedule periodic maintenance
> > > jobs, perhaps in contrib, depending on whether other people think it
> > > should go.  However, I think running periodic jobs is best handled on
> > > Unix with cron or anacron and not a custom tool or a command in Git.
> > 
> > Okay, here is a challenge for you: design this such that the Windows
> > experience does _not_ feel like a 3rd-class citizen. Go ahead. Yes, there
> > is a scheduler. Yep, it does not do cron-like things. Precisely: you have
> > to feed it an XML to make use of the "advanced" features. Yeah, I also
> > cannot remember what the semantics are regarding missed jobs due to
> > shutdown cycles. Nope, you cannot rely on the XML being an option, that
> > would require Windows 10. The list goes on.
> 
> I will freely admit that I know next to nothing about Windows.  I have
> used it only incidentally, if at all, for at least two decades.  It is
> not a platform I generally have an interest in developing for, although
> I try to make it work as well as possible when I am working on a project
> which supports it.
> 
> It is, in general, my assumption, based on its wide usage, that it is a
> powerful and robust operating system with many features, but I have
> little actual knowledge about how it functions or the exact features it
> provides.
> 
> I want a solution that builds on the existing Unix tools for Unix,
> because that is least surprising to users and it is how Unix tools are
> supposed to work.  I think we can agree that Git was designed with the
> Unix philosophy in mind.
> 
> I also want a solution that works on Windows.  Ideally that solution
> would build on existing components that are part of Windows, because it
> reduces the maintenance burden on all of us.  But unfortunately, I know
> next to nothing about how to build such a solution.
> 
> > > I've dealt with systems that implemented periodic tasks without using
> > > the existing tools for doing that, and I've found that usually that's a
> > > mistake.  Despite seeming straightforward, there are a lot of tricky
> > > edge cases to deal with and it's easy to get wrong.
> > 
> > But maybe you found one of those issues in Stolee's patches? If so, please
> > do contribute your experience there to point out those issues, so that
> > they can be addressed.
> 
> One of the benefits of using anacron on Unix is that it can skip running
> tasks when the user is on battery.  This is not anything we can portably
> do across systems, nor is it something that Git should need to know
> about.
> 
> > > We also don't have to reimplement all the features in the system
> > > scheduler and can let expert users use a different tool of their choice
> > > instead if cron (or the Windows equivalent) is not to their liking.
> > 
> > Do we really want to start relying on `cron`, when the major platform used
> > by the target audience (enterprise software engineers who deal with rather
> > larger repositories than git.git or linux.git) quite obviously _lacks_
> > support for that?
> 
> Unix users will be unhappy with us if we use our own scheduling system
> when cron is available.  They will expect us to reimplement those
> features and they will complain if we do not.  While I cannot name
> names, there are a nontrivial number of large, enterprise monorepos that
> run only on macOS and Linux.

Speaking purely as a user, I agree with this point. This is why I want a
single-iteration top-level maintenance command.

Once we have that, we can provide recommended configs for existing
scheduling solutions (cron, launchd, systemd, etc.) in contrib/. If the
Windows scheduler is cumbersome enough that users don't want to use it,
then I think it's perfectly reasonable to provide our own limited
job-runner in contrib/ as well, so long as we don't require people to
use it.

> That doesn't prevent us from building tooling that does the scheduling
> on Windows if we can't use the system scheduler, but it would be nice to
> try to present a relatively unified interface across the two platforms.
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



  reply	other threads:[~2020-05-27 22:39 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 20:47 [PATCH 00/15] [RFC] Maintenance jobs and job runner Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 01/15] run-job: create barebones builtin Derrick Stolee via GitGitGadget
2020-04-05 15:10   ` Phillip Wood
2020-04-05 19:21     ` Junio C Hamano
2020-04-06 14:42       ` Derrick Stolee
2020-04-07  0:58         ` Danh Doan
2020-04-07 10:54           ` Derrick Stolee
2020-04-07 14:16             ` Danh Doan
2020-04-07 14:30               ` Johannes Schindelin
2020-04-03 20:48 ` [PATCH 02/15] run-job: implement commit-graph job Derrick Stolee via GitGitGadget
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 03/15] run-job: implement fetch job Derrick Stolee via GitGitGadget
2020-04-05 15:14   ` Phillip Wood
2020-04-06 12:48     ` Derrick Stolee
2020-04-05 20:28   ` Junio C Hamano
2020-04-06 12:46     ` Derrick Stolee
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 04/15] run-job: implement loose-objects job Derrick Stolee via GitGitGadget
2020-04-05 20:33   ` Junio C Hamano
2020-04-03 20:48 ` [PATCH 05/15] run-job: implement pack-files job Derrick Stolee via GitGitGadget
2020-05-27 22:17   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 06/15] run-job: auto-size or use custom pack-files batch Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 07/15] config: add job.pack-files.batchSize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 08/15] job-runner: create builtin for job loop Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 09/15] job-runner: load repos from config by default Derrick Stolee via GitGitGadget
2020-04-05 15:18   ` Phillip Wood
2020-04-06 12:49     ` Derrick Stolee
2020-04-05 15:41   ` Phillip Wood
2020-04-06 12:57     ` Derrick Stolee
2020-04-03 20:48 ` [PATCH 10/15] job-runner: use config to limit job frequency Derrick Stolee via GitGitGadget
2020-04-05 15:24   ` Phillip Wood
2020-04-03 20:48 ` [PATCH 11/15] job-runner: use config for loop interval Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 12/15] job-runner: add --interval=<span> option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 13/15] job-runner: skip a job if job.<job-name>.enabled is false Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 14/15] job-runner: add --daemonize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 15/15] runjob: customize the loose-objects batch size Derrick Stolee via GitGitGadget
2020-04-03 21:40 ` [PATCH 00/15] [RFC] Maintenance jobs and job runner Junio C Hamano
2020-04-04  0:16   ` Derrick Stolee
2020-04-07  0:50     ` Danh Doan
2020-04-07 10:59       ` Derrick Stolee
2020-04-07 14:26         ` Danh Doan
2020-04-07 14:43           ` Johannes Schindelin
2020-04-07  1:48     ` brian m. carlson
2020-04-07 20:08       ` Junio C Hamano
2020-04-07 22:23       ` Johannes Schindelin
2020-04-08  0:01         ` brian m. carlson
2020-05-27 22:39           ` Josh Steadmon [this message]
2020-05-28  0:47             ` Junio C Hamano
2020-05-27 21:52               ` Johannes Schindelin
2020-05-28 14:48                 ` Junio C Hamano
2020-05-28 14:50                 ` Jonathan Nieder
2020-05-28 14:57                   ` Junio C Hamano
2020-05-28 15:03                     ` Jonathan Nieder
2020-05-28 15:30                       ` Derrick Stolee
2020-05-28  4:39                         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200527223907.GB65111@google.com \
    --to=steadmon@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.