From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Neeraj Singh <nksingh85@gmail.com>
Cc: gitgitgadget@gmail.com, Johannes.Schindelin@gmx.de,
bagasdotme@gmail.com, git@vger.kernel.org,
neerajsi@microsoft.com, newren@gmail.com, ps@pks.im,
rsbecker@nexbridge.com, sandals@crustytoothpaste.net
Subject: do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options)
Date: Wed, 23 Mar 2022 15:20:52 +0100 [thread overview]
Message-ID: <220323.86fsn8ohg8.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20220315191245.17990-1-neerajsi@microsoft.com>
On Tue, Mar 15 2022, Neeraj Singh wrote:
I know this is probably 80% my fault by egging you on about initially
adding the wildmatch() based thing you didn't go for.
But having looked at this with fresh eyes quite deeply I really think
we're severely over-configuring things here:
> +core.fsync::
> + A comma-separated list of components of the repository that
> + should be hardened via the core.fsyncMethod when created or
> + modified. You can disable hardening of any component by
> + prefixing it with a '-'. Items that are not hardened may be
> + lost in the event of an unclean system shutdown. Unless you
> + have special requirements, it is recommended that you leave
> + this option empty or pick one of `committed`, `added`,
> + or `all`.
> ++
> +When this configuration is encountered, the set of components starts with
> +the platform default value, disabled components are removed, and additional
> +components are added. `none` resets the state so that the platform default
> +is ignored.
> ++
> +The empty string resets the fsync configuration to the platform
> +default. The default on most platforms is equivalent to
> +`core.fsync=committed,-loose-object`, which has good performance,
> +but risks losing recent work in the event of an unclean system shutdown.
> ++
> +* `none` clears the set of fsynced components.
> +* `loose-object` hardens objects added to the repo in loose-object form.
> +* `pack` hardens objects added to the repo in packfile form.
> +* `pack-metadata` hardens packfile bitmaps and indexes.
> +* `commit-graph` hardens the commit graph file.
> +* `index` hardens the index when it is modified.
> +* `objects` is an aggregate option that is equivalent to
> + `loose-object,pack`.
> +* `derived-metadata` is an aggregate option that is equivalent to
> + `pack-metadata,commit-graph`.
> +* `committed` is an aggregate option that is currently equivalent to
> + `objects`. This mode sacrifices some performance to ensure that work
> + that is committed to the repository with `git commit` or similar commands
> + is hardened.
> +* `added` is an aggregate option that is currently equivalent to
> + `committed,index`. This mode sacrifices additional performance to
> + ensure that the results of commands like `git add` and similar operations
> + are hardened.
> +* `all` is an aggregate option that syncs all individual components above.
> +
> core.fsyncMethod::
> A value indicating the strategy Git will use to harden repository data
> using fsync and related primitives.
On top of my
https://lore.kernel.org/git/RFC-patch-v2-7.7-a5951366c6e-20220323T140753Z-avarab@gmail.com/
which makes the tmp-objdir part of your not-in-next-just-seen follow-up
series configurable via "fsyncMethod.batch.quarantine" I really think we
should just go for something like the belwo patch (note that
misspelled/mistook "bulk" for "batch" in that linked-t patch, fixed
below.
I.e. I think we should just do our default fsync() of everything, and
probably SOON make the fsync-ing of loose objects the default. Those who
care about performance will have "batch" (or "writeout-only"), which we
can have OS-specific detections for.
But really, all of the rest of this is unduly boxing us into
overconfiguration that I think nobody really needs.
If someone really needs this level of detail they can LD_PRELOAD
something to have fsync intercept fd's and paths, and act appropriately.
Worse, as the RFC series I sent
(https://lore.kernel.org/git/RFC-cover-v2-0.7-00000000000-20220323T140753Z-avarab@gmail.com/)
shows we can and should "batch" up fsync() operations across these
configuration boundaries, which this level of configuration would seem
to preclude.
Or, we'd need to explain why "core.fsync=loose-object" won't *actually*
call fsync() on a single loose object's fd under "batch" as I had to do
on top of this in
https://lore.kernel.org/git/RFC-patch-v2-6.7-c20301d7967-20220323T140753Z-avarab@gmail.com/
The same is going to apply for almost all of the rest of these
configuration categories.
I.e. a natural follow-up to e.g. batching across objects & index as I'm
doing in
https://lore.kernel.org/git/RFC-patch-v2-4.7-61f4f3d7ef4-20220323T140753Z-avarab@gmail.com/
is to do likewise for all the PACK-related stuff before we rename it
in-place. Or even have "git gc" issue only a single fsync() for all of
PACKs, their metadata files, commit-graph etc., and then rename() things
in-place as appropriate afterwards.
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 365a12dc7ae..536238e209b 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,49 +548,35 @@ core.whitespace::
errors. The default tab width is 8. Allowed values are 1 to 63.
core.fsync::
- A comma-separated list of components of the repository that
- should be hardened via the core.fsyncMethod when created or
- modified. You can disable hardening of any component by
- prefixing it with a '-'. Items that are not hardened may be
- lost in the event of an unclean system shutdown. Unless you
- have special requirements, it is recommended that you leave
- this option empty or pick one of `committed`, `added`,
- or `all`.
-+
-When this configuration is encountered, the set of components starts with
-the platform default value, disabled components are removed, and additional
-components are added. `none` resets the state so that the platform default
-is ignored.
-+
-The empty string resets the fsync configuration to the platform
-default. The default on most platforms is equivalent to
-`core.fsync=committed,-loose-object`, which has good performance,
-but risks losing recent work in the event of an unclean system shutdown.
-+
-* `none` clears the set of fsynced components.
-* `loose-object` hardens objects added to the repo in loose-object form.
-* `pack` hardens objects added to the repo in packfile form.
-* `pack-metadata` hardens packfile bitmaps and indexes.
-* `commit-graph` hardens the commit graph file.
-* `index` hardens the index when it is modified.
-* `objects` is an aggregate option that is equivalent to
- `loose-object,pack`.
-* `derived-metadata` is an aggregate option that is equivalent to
- `pack-metadata,commit-graph`.
-* `committed` is an aggregate option that is currently equivalent to
- `objects`. This mode sacrifices some performance to ensure that work
- that is committed to the repository with `git commit` or similar commands
- is hardened.
-* `added` is an aggregate option that is currently equivalent to
- `committed,index`. This mode sacrifices additional performance to
- ensure that the results of commands like `git add` and similar operations
- are hardened.
-* `all` is an aggregate option that syncs all individual components above.
+ A boolen defaulting to `true`. To ensure data integrity git
+ will fsync() its objects, index and refu updates etc. This can
+ be set to `false` to disable `fsync()`-ing.
++
+Only set this to `false` if you know what you're doing, and are
+prepared to deal with data corruption. Valid use-cases include
+throwaway uses of repositories on ramdisks, one-off mass-imports
+followed by calling `sync(1)` etc.
++
+Note that the syncing of loose objects is currently excluded from
+`core.fsync=true`. To turn on all fsync-ing you'll need
+`core.fsync=true` and `core.fsyncObjectFiles=true`, but see
+`core.fsyncMethod=batch` below for a much faster alternative that's
+just as safe on various modern OS's.
++
+The default is in flux and may change in the future, in particular the
+equivalent of the already-deprecated `core.fsyncObjectFiles` setting
+might soon default to `true`, and `core.fsyncMethod`'s default of
+`fsync` might default to a setting deemed to be safe on the local OS,
+suc has `batch` or `writeout-only`
core.fsyncMethod::
A value indicating the strategy Git will use to harden repository data
using fsync and related primitives.
+
+Defaults to `fsync`, but as discussed for `core.fsync` above might
+change to use one of the values below taking advantage of
+platform-specific "faster `fsync()`".
++
* `fsync` uses the fsync() system call or platform equivalents.
* `writeout-only` issues pagecache writeback requests, but depending on the
filesystem and storage hardware, data added to the repository may not be
@@ -680,8 +666,8 @@ backed up by any standard (e.g. POSIX), but worked in practice on some
Linux setups.
+
Nowadays you should almost certainly want to use
-`core.fsync=loose-object` instead in combination with
-`core.fsyncMethod=bulk`, and possibly with
+`core.fsync=true` instead in combination with
+`core.fsyncMethod=batch`, and possibly with
`fsyncMethod.batch.quarantine=true`, see above. On modern OS's (Linux,
OSX, Windows) that gives you most of the performance benefit of
`core.fsyncObjectFiles=false` with all of the safety of the old
next prev parent reply other threads:[~2022-03-23 14:46 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-04 3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04 3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06 7:54 ` Neeraj Singh
2021-12-04 3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44 ` Patrick Steinhardt
2021-12-07 12:14 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29 ` Neeraj Singh
2021-12-07 12:18 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53 ` Patrick Steinhardt
2021-12-07 20:46 ` Neeraj Singh
2021-12-07 12:29 ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44 ` Neeraj Singh
2021-12-08 10:05 ` Ævar Arnfjörð Bjarmason
2021-12-09 0:14 ` Neeraj Singh
2021-12-09 0:44 ` Junio C Hamano
2021-12-09 4:08 ` Ævar Arnfjörð Bjarmason
2021-12-09 6:18 ` Neeraj Singh
2022-01-18 23:50 ` Neeraj Singh
2022-01-19 15:28 ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52 ` Ævar Arnfjörð Bjarmason
2022-01-28 1:28 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08 0:44 ` Neeraj Singh
2021-12-09 0:57 ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08 1:13 ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09 0:55 ` rsbecker
2022-01-10 19:00 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02 0:51 ` Junio C Hamano
2022-02-02 1:42 ` Junio C Hamano
2022-02-11 21:18 ` Neeraj Singh
2022-02-11 22:19 ` Junio C Hamano
2022-02-11 23:04 ` Neeraj Singh
2022-02-11 23:15 ` Junio C Hamano
2022-02-12 0:39 ` rsbecker
2022-02-14 7:04 ` Patrick Steinhardt
2022-02-14 17:17 ` Junio C Hamano
2022-03-09 13:42 ` Patrick Steinhardt
2022-03-09 18:50 ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03 ` Junio C Hamano
2022-03-10 12:33 ` Patrick Steinhardt
2022-03-10 17:15 ` Junio C Hamano
2022-03-09 20:05 ` Neeraj Singh
2022-02-11 20:38 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29 ` Junio C Hamano
2022-03-10 1:21 ` Neeraj Singh
2022-03-10 1:26 ` brian m. carlson
2022-03-10 1:56 ` Neeraj Singh
2022-03-09 23:03 ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10 0:21 ` Junio C Hamano
2022-03-10 2:53 ` Neeraj Singh
2022-03-10 7:19 ` Junio C Hamano
2022-03-10 18:38 ` Neeraj Singh
2022-03-10 18:44 ` Junio C Hamano
2022-03-10 19:57 ` Junio C Hamano
2022-03-10 20:25 ` Neeraj Singh
2022-03-10 21:17 ` Junio C Hamano
2022-03-10 13:11 ` Johannes Schindelin
2022-03-10 17:18 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10 9:53 ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10 9:53 ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34 ` Junio C Hamano
2022-03-10 18:40 ` Neeraj Singh
2022-03-10 9:53 ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25 ` Junio C Hamano
2022-03-10 19:03 ` Neeraj Singh
2022-03-10 22:54 ` Neeraj Singh
2022-03-11 6:40 ` Junio C Hamano
2022-03-11 9:15 ` Patrick Steinhardt
2022-03-11 9:36 ` Ævar Arnfjörð Bjarmason
2022-03-10 9:53 ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28 ` Junio C Hamano
2022-03-11 9:10 ` Patrick Steinhardt
2022-03-10 22:43 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06 ` Jiang Xin
2022-03-28 19:45 ` Neeraj Singh
2022-03-10 22:43 ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12 ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32 ` Junio C Hamano
2022-03-15 19:56 ` Neeraj Singh
2022-03-23 14:20 ` Ævar Arnfjörð Bjarmason [this message]
2022-03-25 21:24 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 0:24 ` Ævar Arnfjörð Bjarmason
2022-03-26 1:23 ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26 1:25 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31 ` Ævar Arnfjörð Bjarmason
2022-03-27 5:27 ` Neeraj Singh
2022-03-27 12:43 ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56 ` Patrick Steinhardt
2022-03-28 11:25 ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56 ` Neeraj Singh
2022-03-30 16:59 ` Neeraj Singh
2022-03-10 23:34 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11 0:03 ` Neeraj Singh
2022-03-11 18:50 ` Neeraj Singh
2022-03-13 23:50 ` Junio C Hamano
2022-03-11 9:58 ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25 6:11 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=220323.86fsn8ohg8.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=neerajsi@microsoft.com \
--cc=newren@gmail.com \
--cc=nksingh85@gmail.com \
--cc=ps@pks.im \
--cc=rsbecker@nexbridge.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.