All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Neeraj Singh via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, rsbecker@nexbridge.com,
	bagasdotme@gmail.com, newren@gmail.com, nksingh85@gmail.com,
	ps@pks.im, Neeraj Singh <neerajsi@microsoft.com>
Subject: Re: [PATCH v2 2/3] core.fsync: introduce granular fsync control
Date: Tue, 07 Dec 2021 13:29:04 +0100	[thread overview]
Message-ID: <211207.86wnkgo9fv.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <ff80a94bf9add8a6fabcd5146e5177edf5e35e49.1638845211.git.gitgitgadget@gmail.com>


On Tue, Dec 07 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> This commit introduces the `core.fsync` configuration
> knob which can be used to control how components of the
> repository are made durable on disk.
>
> This setting allows future extensibility of the list of
> syncable components:
> * We issue a warning rather than an error for unrecognized
>   components, so new configs can be used with old Git versions.

Looks good!

> * We support negation, so users can choose one of the default
>   aggregate options and then remove components that they don't
>   want. The user would then harden any new components added in
>   a Git version update.

I think this config schema makes sense, but just a (I think important)
comment on the "how" not "what" of it. It's really much better to define
config as:

    [some]
    key = value
    key = value2

Than:

    [some]
    key = value,value2

The reason is that "git config" has good support for working with
multi-valued stuff, so you can do e.g.:

    git config --get-all -z some.key

And you can easily (re)set such config e.g. with --replace-all etc., but
for comma-delimited you (and users) need to do all that work themselves.

Similarly instead of:

    some.key = want-this
    some.key = -not-this
    some.key = but-want-this

I think it's better to just have two lists, one inclusive another
exclusive. E.g. see "log.decorate" and "log.excludeDecoration",
"transfer.hideRefs"

Which would mean:

    core.fsync = want-this
    core.fsyncExcludes = -not-this

For some value of "fsyncExcludes", maybe "noFsync"? Anyway, just a
suggestion on making this easier for users & the implementation.

> This also supports the common request of doing absolutely no
> fysncing with the `core.fsync=none` value, which is expected
> to make the test suite faster.

Let's just use the git_parse_maybe_bool() or git_parse_maybe_bool_text()
so we'll accept "false", "off", "no" like most other such config?

> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  Documentation/config/core.txt | 27 +++++++++----
>  builtin/fast-import.c         |  2 +-
>  builtin/index-pack.c          |  4 +-
>  builtin/pack-objects.c        |  8 ++--
>  bulk-checkin.c                |  5 ++-
>  cache.h                       | 39 +++++++++++++++++-
>  commit-graph.c                |  3 +-
>  config.c                      | 76 ++++++++++++++++++++++++++++++++++-
>  csum-file.c                   |  5 ++-
>  csum-file.h                   |  3 +-
>  environment.c                 |  1 +
>  midx.c                        |  3 +-
>  object-file.c                 |  3 +-
>  pack-bitmap-write.c           |  3 +-
>  pack-write.c                  | 13 +++---
>  read-cache.c                  |  2 +-
>  16 files changed, 164 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index dbb134f7136..4f1747ec871 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -547,6 +547,25 @@ core.whitespace::
>    is relevant for `indent-with-non-tab` and when Git fixes `tab-in-indent`
>    errors. The default tab width is 8. Allowed values are 1 to 63.
>  
> +core.fsync::
> +	A comma-separated list of parts of the repository which should be
> +	hardened via the core.fsyncMethod when created or modified. You can
> +	disable hardening of any component by prefixing it with a '-'. Later
> +	items take precedence over earlier ones in the list. For example,
> +	`core.fsync=all,-pack-metadata` means "harden everything except pack
> +	metadata." Items that are not hardened may be lost in the event of an
> +	unclean system shutdown.
> ++
> +* `none` disables fsync completely. This must be specified alone.
> +* `loose-object` hardens objects added to the repo in loose-object form.
> +* `pack` hardens objects added to the repo in packfile form.
> +* `pack-metadata` hardens packfile bitmaps and indexes.
> +* `commit-graph` hardens the commit graph file.
> +* `objects` is an aggregate option that includes `loose-objects`, `pack`,
> +  `pack-metadata`, and `commit-graph`.
> +* `default` is an aggregate option that is equivalent to `objects,-loose-object`
> +* `all` is an aggregate option that syncs all individual components above.
> +

It's probably a *bit* more work to set up, but I wonder if this wouldn't
be simpler if we just said (and this is partially going against what I
noted above):

== BEGIN DOC

core.fsync is a multi-value config variable where each item is a
pathspec that'll get matched the same way as 'git-ls-files' et al.

When we sync pretend that a path like .git/objects/de/adbeef... is
relative to the top-level of the git
directory. E.g. "objects/de/adbeaf.." or "objects/pack/...".

You can then supply a list of wildcards and exclusions to configure
syncing.  or "false", "off" etc. to turn it off. These are synonymous
with:

    ; same as "false"
    core.fsync = ":!*"

Or:

    ; same as "true"
    core.fsync = "*"

Or, to selectively sync some things and not others:

    ;; Sync objects, but not "info"
    core.fsync = ":!objects/info/**"
    core.fsync = "objects/**"

See gitrepository-layout(5) for details about what sort of paths you
might be expected to match. Not all paths listed there will go through
this mechanism (e.g. currently objects do, but nothing to do with config
does).

We can and will match this against "fake paths", e.g. when writing out
packs we may match against just the string "objects/pack", we're not
going to re-check if every packfile we're writing matches your globs,
ditto for loose objects. Be reasonable!

This metharism is intended as a shorthand that provides some flexibility
when fsyncing, while not forcing git to come up with labels for all
paths the git dir, or to support crazyness like "objects/de/adbeef*"

More paths may be added or removed in the future, and we make no
promises that we won't move things around, so if in doubt use
e.g. "true" or a wide pattern match like "objects/**". When in doubt
stick to the golden path of examples provided in this documentation.

== END DOC


It's a tad more complex to set up, but I wonder if that isn't worth
it. It nicely gets around any current and future issues of deciding what
labels such as "loose-object" etc. to pick, as well as slotting into an
existing method of doing exclude/include lists.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 857be7826f3..916c55d6ce9 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1204,11 +1204,13 @@ static void write_pack_file(void)
>  		 * If so, rewrite it like in fast-import
>  		 */
>  		if (pack_to_stdout) {
> -			finalize_hashfile(f, hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
> +			finalize_hashfile(f, hash, FSYNC_COMPONENT_NONE,
> +					  CSUM_HASH_IN_STREAM | CSUM_CLOSE);

Not really related to this per-se, but since you're touching the API
everything goes through I wonder if callers should just always try to
fsync, and we can just catch EROFS and EINVAL in the wrapper if someone
tries to flush stdout, or catch the fd at that lower level.

Or maybe there's a good reason for this...

> [...]
> +/*
> + * These values are used to help identify parts of a repository to fsync.
> + * FSYNC_COMPONENT_NONE identifies data that will not be a persistent part of the
> + * repository and so shouldn't be fsynced.
> + */
> +enum fsync_component {
> +	FSYNC_COMPONENT_NONE			= 0,

I haven't read ahead much but in most other such cases we don't define
the "= 0", just start at 1<<0, then check the flags elsewhere...

> +static const struct fsync_component_entry {
> +	const char *name;
> +	enum fsync_component component_bits;
> +} fsync_component_table[] = {
> +	{ "loose-object", FSYNC_COMPONENT_LOOSE_OBJECT },
> +	{ "pack", FSYNC_COMPONENT_PACK },
> +	{ "pack-metadata", FSYNC_COMPONENT_PACK_METADATA },
> +	{ "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH },
> +	{ "objects", FSYNC_COMPONENTS_OBJECTS },
> +	{ "default", FSYNC_COMPONENTS_DEFAULT },
> +	{ "all", FSYNC_COMPONENTS_ALL },
> +};
> +
> +static enum fsync_component parse_fsync_components(const char *var, const char *string)
> +{
> +	enum fsync_component output = 0;
> +
> +	if (!strcmp(string, "none"))
> +		return output;
> +
> +	while (string) {
> +		int i;
> +		size_t len;
> +		const char *ep;
> +		int negated = 0;
> +		int found = 0;
> +
> +		string = string + strspn(string, ", \t\n\r");

Aside from the "use a list" isn't this hardcoding some windows-specific
assumptions with \n\r? Maybe not...

  parent reply	other threads:[~2021-12-07 13:01 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-04  3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04  3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06  7:54   ` Neeraj Singh
2021-12-04  3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07  2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07  2:46   ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44     ` Patrick Steinhardt
2021-12-07 12:14       ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29       ` Neeraj Singh
2021-12-07 12:18     ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58       ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53     ` Patrick Steinhardt
2021-12-07 20:46       ` Neeraj Singh
2021-12-07 12:29     ` Ævar Arnfjörð Bjarmason [this message]
2021-12-07 21:44       ` Neeraj Singh
2021-12-08 10:05         ` Ævar Arnfjörð Bjarmason
2021-12-09  0:14           ` Neeraj Singh
2021-12-09  0:44             ` Junio C Hamano
2021-12-09  4:08             ` Ævar Arnfjörð Bjarmason
2021-12-09  6:18               ` Neeraj Singh
2022-01-18 23:50                 ` Neeraj Singh
2022-01-19 15:28                   ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52                 ` Ævar Arnfjörð Bjarmason
2022-01-28  1:28                   ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56   ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08  0:44     ` Neeraj Singh
2021-12-09  0:57   ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08  1:13     ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09  0:55       ` rsbecker
2022-01-10 19:00         ` Neeraj Singh
2022-02-01  3:33     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02  0:51         ` Junio C Hamano
2022-02-02  1:42           ` Junio C Hamano
2022-02-11 21:18             ` Neeraj Singh
2022-02-11 22:19               ` Junio C Hamano
2022-02-11 23:04                 ` Neeraj Singh
2022-02-11 23:15                   ` Junio C Hamano
2022-02-12  0:39                     ` rsbecker
2022-02-14  7:04                     ` Patrick Steinhardt
2022-02-14 17:17                       ` Junio C Hamano
2022-03-09 13:42                         ` Patrick Steinhardt
2022-03-09 18:50                           ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03                           ` Junio C Hamano
2022-03-10 12:33                             ` Patrick Steinhardt
2022-03-10 17:15                               ` Junio C Hamano
2022-03-09 20:05                           ` Neeraj Singh
2022-02-11 20:38           ` Neeraj Singh
2022-02-01  3:33       ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03       ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29           ` Junio C Hamano
2022-03-10  1:21             ` Neeraj Singh
2022-03-10  1:26           ` brian m. carlson
2022-03-10  1:56             ` Neeraj Singh
2022-03-09 23:03         ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48           ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10  0:21           ` Junio C Hamano
2022-03-10  2:53             ` Neeraj Singh
2022-03-10  7:19               ` Junio C Hamano
2022-03-10 18:38                 ` Neeraj Singh
2022-03-10 18:44                   ` Junio C Hamano
2022-03-10 19:57                     ` Junio C Hamano
2022-03-10 20:25                       ` Neeraj Singh
2022-03-10 21:17                         ` Junio C Hamano
2022-03-10 13:11               ` Johannes Schindelin
2022-03-10 17:18               ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10  9:53         ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10  9:53         ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34           ` Junio C Hamano
2022-03-10 18:40             ` Neeraj Singh
2022-03-10  9:53         ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25           ` Junio C Hamano
2022-03-10 19:03             ` Neeraj Singh
2022-03-10 22:54           ` Neeraj Singh
2022-03-11  6:40           ` Junio C Hamano
2022-03-11  9:15             ` Patrick Steinhardt
2022-03-11  9:36               ` Ævar Arnfjörð Bjarmason
2022-03-10  9:53         ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28           ` Junio C Hamano
2022-03-11  9:10             ` Patrick Steinhardt
2022-03-10 22:43         ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06             ` Jiang Xin
2022-03-28 19:45               ` Neeraj Singh
2022-03-10 22:43           ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12             ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32               ` Junio C Hamano
2022-03-15 19:56                 ` Neeraj Singh
2022-03-23 14:20               ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24                 ` Neeraj Singh
2022-03-26  0:24                   ` Ævar Arnfjörð Bjarmason
2022-03-26  1:23                     ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26  1:25                     ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31                       ` Ævar Arnfjörð Bjarmason
2022-03-27  5:27                         ` Neeraj Singh
2022-03-27 12:43                           ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56                             ` Patrick Steinhardt
2022-03-28 11:25                               ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56                                 ` Neeraj Singh
2022-03-30 16:59                                   ` Neeraj Singh
2022-03-10 23:34           ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11  0:03             ` Neeraj Singh
2022-03-11 18:50               ` Neeraj Singh
2022-03-13 23:50             ` Junio C Hamano
2022-03-11  9:58           ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25  6:11             ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211207.86wnkgo9fv.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=neerajsi@microsoft.com \
    --cc=newren@gmail.com \
    --cc=nksingh85@gmail.com \
    --cc=ps@pks.im \
    --cc=rsbecker@nexbridge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.