git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Neeraj Singh <nksingh85@gmail.com>
Cc: "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Git List <git@vger.kernel.org>,
	Neeraj Singh <neerajsi@microsoft.com>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	"Randall S. Becker" <rsbecker@nexbridge.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options)
Date: Sat, 26 Mar 2022 01:24:57 +0100	[thread overview]
Message-ID: <220326.86ils1lfho.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CANQDOdeeP8opTQj-j_j3=KnU99nYTnNYhyQmAojj=FZtZEkCZQ@mail.gmail.com>


On Fri, Mar 25 2022, Neeraj Singh wrote:

> On Wed, Mar 23, 2022 at 7:46 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Tue, Mar 15 2022, Neeraj Singh wrote:
>>
>> I know this is probably 80% my fault by egging you on about initially
>> adding the wildmatch() based thing you didn't go for.
>>
>> But having looked at this with fresh eyes quite deeply I really think
>> we're severely over-configuring things here:
>>
>> > +core.fsync::
>> > +     A comma-separated list of components of the repository that
>> > +     should be hardened via the core.fsyncMethod when created or
>> > +     modified.  You can disable hardening of any component by
>> > +     prefixing it with a '-'.  Items that are not hardened may be
>> > +     lost in the event of an unclean system shutdown. Unless you
>> > +     have special requirements, it is recommended that you leave
>> > +     this option empty or pick one of `committed`, `added`,
>> > +     or `all`.
>> > ++
>> > +When this configuration is encountered, the set of components starts with
>> > +the platform default value, disabled components are removed, and additional
>> > +components are added. `none` resets the state so that the platform default
>> > +is ignored.
>> > ++
>> > +The empty string resets the fsync configuration to the platform
>> > +default. The default on most platforms is equivalent to
>> > +`core.fsync=committed,-loose-object`, which has good performance,
>> > +but risks losing recent work in the event of an unclean system shutdown.
>> > ++
>> > +* `none` clears the set of fsynced components.
>> > +* `loose-object` hardens objects added to the repo in loose-object form.
>> > +* `pack` hardens objects added to the repo in packfile form.
>> > +* `pack-metadata` hardens packfile bitmaps and indexes.
>> > +* `commit-graph` hardens the commit graph file.
>> > +* `index` hardens the index when it is modified.
>> > +* `objects` is an aggregate option that is equivalent to
>> > +  `loose-object,pack`.
>> > +* `derived-metadata` is an aggregate option that is equivalent to
>> > +  `pack-metadata,commit-graph`.
>> > +* `committed` is an aggregate option that is currently equivalent to
>> > +  `objects`. This mode sacrifices some performance to ensure that work
>> > +  that is committed to the repository with `git commit` or similar commands
>> > +  is hardened.
>> > +* `added` is an aggregate option that is currently equivalent to
>> > +  `committed,index`. This mode sacrifices additional performance to
>> > +  ensure that the results of commands like `git add` and similar operations
>> > +  are hardened.
>> > +* `all` is an aggregate option that syncs all individual components above.
>> > +
>> >  core.fsyncMethod::
>> >       A value indicating the strategy Git will use to harden repository data
>> >       using fsync and related primitives.
>>
>> On top of my
>> https://lore.kernel.org/git/RFC-patch-v2-7.7-a5951366c6e-20220323T140753Z-avarab@gmail.com/
>> which makes the tmp-objdir part of your not-in-next-just-seen follow-up
>> series configurable via "fsyncMethod.batch.quarantine" I really think we
>> should just go for something like the belwo patch (note that
>> misspelled/mistook "bulk" for "batch" in that linked-t patch, fixed
>> below.
>>
>> I.e. I think we should just do our default fsync() of everything, and
>> probably SOON make the fsync-ing of loose objects the default. Those who
>> care about performance will have "batch" (or "writeout-only"), which we
>> can have OS-specific detections for.
>>
>> But really, all of the rest of this is unduly boxing us into
>> overconfiguration that I think nobody really needs.
>>
>
> We've gone over this a few times already, but just wanted to state it
> again.  The really detailed settings are really there for Git hosters
> like GitLab or GitHub. I'd be happy to remove the per-component
> core.fsync values from the documentation and leave just the ones we
> point the user to.

I'm prettty sure (but Patrick knows more) that GitLab's plan for this is
to keep it at whatever the safest setting is, presumably GitHub's as
well (but I don't know at all on that front).

>> If someone really needs this level of detail they can LD_PRELOAD
>> something to have fsync intercept fd's and paths, and act appropriately.
>>
>> Worse, as the RFC series I sent
>> (https://lore.kernel.org/git/RFC-cover-v2-0.7-00000000000-20220323T140753Z-avarab@gmail.com/)
>> shows we can and should "batch" up fsync() operations across these
>> configuration boundaries, which this level of configuration would seem
>> to preclude.
>>
>> Or, we'd need to explain why "core.fsync=loose-object" won't *actually*
>> call fsync() on a single loose object's fd under "batch" as I had to do
>> on top of this in
>> https://lore.kernel.org/git/RFC-patch-v2-6.7-c20301d7967-20220323T140753Z-avarab@gmail.com/
>>
>
> 99.9% of users don't care and won't look.  The ones who do look deeper
> and understand the issues have source code and access to this ML
> discussion to understand why this works this way.

Exactly, so we can hopefully have a simpler interface.

>> The same is going to apply for almost all of the rest of these
>> configuration categories.
>>
>> I.e. a natural follow-up to e.g. batching across objects & index as I'm
>> doing in
>> https://lore.kernel.org/git/RFC-patch-v2-4.7-61f4f3d7ef4-20220323T140753Z-avarab@gmail.com/
>> is to do likewise for all the PACK-related stuff before we rename it
>> in-place. Or even have "git gc" issue only a single fsync() for all of
>> PACKs, their metadata files, commit-graph etc., and then rename() things
>> in-place as appropriate afterwards.
>>
>> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
>> index 365a12dc7ae..536238e209b 100644
>> --- a/Documentation/config/core.txt
>> +++ b/Documentation/config/core.txt
>> @@ -548,49 +548,35 @@ core.whitespace::
>>    errors. The default tab width is 8. Allowed values are 1 to 63.
>>
>>  core.fsync::
>> -       A comma-separated list of components of the repository that
>> -       should be hardened via the core.fsyncMethod when created or
>> -       modified.  You can disable hardening of any component by
>> -       prefixing it with a '-'.  Items that are not hardened may be
>> -       lost in the event of an unclean system shutdown. Unless you
>> -       have special requirements, it is recommended that you leave
>> -       this option empty or pick one of `committed`, `added`,
>> -       or `all`.
>> -+
>> -When this configuration is encountered, the set of components starts with
>> -the platform default value, disabled components are removed, and additional
>> -components are added. `none` resets the state so that the platform default
>> -is ignored.
>> -+
>> -The empty string resets the fsync configuration to the platform
>> -default. The default on most platforms is equivalent to
>> -`core.fsync=committed,-loose-object`, which has good performance,
>> -but risks losing recent work in the event of an unclean system shutdown.
>> -+
>> -* `none` clears the set of fsynced components.
>> -* `loose-object` hardens objects added to the repo in loose-object form.
>> -* `pack` hardens objects added to the repo in packfile form.
>> -* `pack-metadata` hardens packfile bitmaps and indexes.
>> -* `commit-graph` hardens the commit graph file.
>> -* `index` hardens the index when it is modified.
>> -* `objects` is an aggregate option that is equivalent to
>> -  `loose-object,pack`.
>> -* `derived-metadata` is an aggregate option that is equivalent to
>> -  `pack-metadata,commit-graph`.
>> -* `committed` is an aggregate option that is currently equivalent to
>> -  `objects`. This mode sacrifices some performance to ensure that work
>> -  that is committed to the repository with `git commit` or similar commands
>> -  is hardened.
>> -* `added` is an aggregate option that is currently equivalent to
>> -  `committed,index`. This mode sacrifices additional performance to
>> -  ensure that the results of commands like `git add` and similar operations
>> -  are hardened.
>> -* `all` is an aggregate option that syncs all individual components above.
>> +       A boolen defaulting to `true`. To ensure data integrity git
>> +       will fsync() its objects, index and refu updates etc. This can
>> +       be set to `false` to disable `fsync()`-ing.
>> ++
>> +Only set this to `false` if you know what you're doing, and are
>> +prepared to deal with data corruption. Valid use-cases include
>> +throwaway uses of repositories on ramdisks, one-off mass-imports
>> +followed by calling `sync(1)` etc.
>> ++
>> +Note that the syncing of loose objects is currently excluded from
>> +`core.fsync=true`. To turn on all fsync-ing you'll need
>> +`core.fsync=true` and `core.fsyncObjectFiles=true`, but see
>> +`core.fsyncMethod=batch` below for a much faster alternative that's
>> +just as safe on various modern OS's.
>> ++
>> +The default is in flux and may change in the future, in particular the
>> +equivalent of the already-deprecated `core.fsyncObjectFiles` setting
>> +might soon default to `true`, and `core.fsyncMethod`'s default of
>> +`fsync` might default to a setting deemed to be safe on the local OS,
>> +suc has `batch` or `writeout-only`
>>
>>  core.fsyncMethod::
>>         A value indicating the strategy Git will use to harden repository data
>>         using fsync and related primitives.
>>  +
>> +Defaults to `fsync`, but as discussed for `core.fsync` above might
>> +change to use one of the values below taking advantage of
>> +platform-specific "faster `fsync()`".
>> ++
>>  * `fsync` uses the fsync() system call or platform equivalents.
>>  * `writeout-only` issues pagecache writeback requests, but depending on the
>>    filesystem and storage hardware, data added to the repository may not be
>> @@ -680,8 +666,8 @@ backed up by any standard (e.g. POSIX), but worked in practice on some
>>  Linux setups.
>>  +
>>  Nowadays you should almost certainly want to use
>> -`core.fsync=loose-object` instead in combination with
>> -`core.fsyncMethod=bulk`, and possibly with
>> +`core.fsync=true` instead in combination with
>> +`core.fsyncMethod=batch`, and possibly with
>>  `fsyncMethod.batch.quarantine=true`, see above. On modern OS's (Linux,
>>  OSX, Windows) that gives you most of the performance benefit of
>>  `core.fsyncObjectFiles=false` with all of the safety of the old
>
> I'm at the point where I don't want to endlessly revisit this discussion.

Sorry, my intention isn't to frustrate you, but I do think it's
important to get this right.

Particularly since this is now in "next", and we're getting closer to a
release. We can either talk about this now and decide on something, or
it'll be in a release, and then publicly documented promises will be
harder to back out of.

I think your suggestion of just hiding the relevant documentation would
be a good band-aid solution to that.

But I also think that given how I was altering this in my RFC series
that the premise of how this could be structured has been called into
question in a way that we didn't (or I don't recall) us having discussed
before.

I.e. that we can say "sync loose, but not index", or "sync index, but
not loose" with this config schema. When with "bulk" we it really isn't
any more expensive to do both if one is true (even cheaper, actually).


  reply	other threads:[~2022-03-26  0:33 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-04  3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04  3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06  7:54   ` Neeraj Singh
2021-12-04  3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07  2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07  2:46   ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44     ` Patrick Steinhardt
2021-12-07 12:14       ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29       ` Neeraj Singh
2021-12-07 12:18     ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58       ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53     ` Patrick Steinhardt
2021-12-07 20:46       ` Neeraj Singh
2021-12-07 12:29     ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44       ` Neeraj Singh
2021-12-08 10:05         ` Ævar Arnfjörð Bjarmason
2021-12-09  0:14           ` Neeraj Singh
2021-12-09  0:44             ` Junio C Hamano
2021-12-09  4:08             ` Ævar Arnfjörð Bjarmason
2021-12-09  6:18               ` Neeraj Singh
2022-01-18 23:50                 ` Neeraj Singh
2022-01-19 15:28                   ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52                 ` Ævar Arnfjörð Bjarmason
2022-01-28  1:28                   ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56   ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08  0:44     ` Neeraj Singh
2021-12-09  0:57   ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08  1:13     ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09  0:55       ` rsbecker
2022-01-10 19:00         ` Neeraj Singh
2022-02-01  3:33     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02  0:51         ` Junio C Hamano
2022-02-02  1:42           ` Junio C Hamano
2022-02-11 21:18             ` Neeraj Singh
2022-02-11 22:19               ` Junio C Hamano
2022-02-11 23:04                 ` Neeraj Singh
2022-02-11 23:15                   ` Junio C Hamano
2022-02-12  0:39                     ` rsbecker
2022-02-14  7:04                     ` Patrick Steinhardt
2022-02-14 17:17                       ` Junio C Hamano
2022-03-09 13:42                         ` Patrick Steinhardt
2022-03-09 18:50                           ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03                           ` Junio C Hamano
2022-03-10 12:33                             ` Patrick Steinhardt
2022-03-10 17:15                               ` Junio C Hamano
2022-03-09 20:05                           ` Neeraj Singh
2022-02-11 20:38           ` Neeraj Singh
2022-02-01  3:33       ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03       ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29           ` Junio C Hamano
2022-03-10  1:21             ` Neeraj Singh
2022-03-10  1:26           ` brian m. carlson
2022-03-10  1:56             ` Neeraj Singh
2022-03-09 23:03         ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48           ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10  0:21           ` Junio C Hamano
2022-03-10  2:53             ` Neeraj Singh
2022-03-10  7:19               ` Junio C Hamano
2022-03-10 18:38                 ` Neeraj Singh
2022-03-10 18:44                   ` Junio C Hamano
2022-03-10 19:57                     ` Junio C Hamano
2022-03-10 20:25                       ` Neeraj Singh
2022-03-10 21:17                         ` Junio C Hamano
2022-03-10 13:11               ` Johannes Schindelin
2022-03-10 17:18               ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10  9:53         ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10  9:53         ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34           ` Junio C Hamano
2022-03-10 18:40             ` Neeraj Singh
2022-03-10  9:53         ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25           ` Junio C Hamano
2022-03-10 19:03             ` Neeraj Singh
2022-03-10 22:54           ` Neeraj Singh
2022-03-11  6:40           ` Junio C Hamano
2022-03-11  9:15             ` Patrick Steinhardt
2022-03-11  9:36               ` Ævar Arnfjörð Bjarmason
2022-03-10  9:53         ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28           ` Junio C Hamano
2022-03-11  9:10             ` Patrick Steinhardt
2022-03-10 22:43         ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06             ` Jiang Xin
2022-03-28 19:45               ` Neeraj Singh
2022-03-10 22:43           ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12             ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32               ` Junio C Hamano
2022-03-15 19:56                 ` Neeraj Singh
2022-03-23 14:20               ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24                 ` Neeraj Singh
2022-03-26  0:24                   ` Ævar Arnfjörð Bjarmason [this message]
2022-03-26  1:23                     ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26  1:25                     ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31                       ` Ævar Arnfjörð Bjarmason
2022-03-27  5:27                         ` Neeraj Singh
2022-03-27 12:43                           ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56                             ` Patrick Steinhardt
2022-03-28 11:25                               ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56                                 ` Neeraj Singh
2022-03-30 16:59                                   ` Neeraj Singh
2022-03-10 23:34           ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11  0:03             ` Neeraj Singh
2022-03-11 18:50               ` Neeraj Singh
2022-03-13 23:50             ` Junio C Hamano
2022-03-11  9:58           ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25  6:11             ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220326.86ils1lfho.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=neerajsi@microsoft.com \
    --cc=newren@gmail.com \
    --cc=nksingh85@gmail.com \
    --cc=ps@pks.im \
    --cc=rsbecker@nexbridge.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).