From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, rsbecker@nexbridge.com,
bagasdotme@gmail.com, newren@gmail.com, avarab@gmail.com,
nksingh85@gmail.com, sandals@crustytoothpaste.net,
"Neeraj K. Singh" <neerajsi@microsoft.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2] core.fsync: new option to harden references
Date: Fri, 25 Mar 2022 07:11:49 +0100 [thread overview]
Message-ID: <20220325061149.GA2571@szeder.dev> (raw)
In-Reply-To: <47dd79106b93bb81750320d50ccaa74c24aacd28.1646992380.git.ps@pks.im>
On Fri, Mar 11, 2022 at 10:58:59AM +0100, Patrick Steinhardt wrote:
> When writing both loose and packed references to disk we first create a
> lockfile, write the updated values into that lockfile, and on commit we
> rename the file into place. According to filesystem developers, this
> behaviour is broken because applications should always sync data to disk
> before doing the final rename to ensure data consistency [1][2][3]. If
> applications fail to do this correctly, a hard crash of the machine can
> easily result in corrupted on-disk data.
>
> This kind of corruption can in fact be easily observed with Git when the
> machine hard-resets shortly after writing references to disk. On
> machines with ext4, this will likely lead to the "empty files" problem:
> the file has been renamed, but its data has not been synced to disk. The
> result is that the reference is corrupt, and in the worst case this can
> lead to data loss.
>
> Implement a new option to harden references so that users and admins can
> avoid this scenario by syncing locked loose and packed references to
> disk before we rename them into place.
In 't5541-http-push-smart.sh' there is a test case called 'push 2000
tags over http', which does pretty much what it's title says. This
patch makes that test case significantly slower.
diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index 8ca50f8b18..d7e94cb791 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -415,7 +415,7 @@ test_expect_success CMDLINE_LIMIT 'push 2000 tags over http' '
sort |
sed "s|.*|$sha1 refs/tags/really-long-tag-name-&|" \
>.git/packed-refs &&
- run_with_limited_cmdline git push --mirror
+ run_with_limited_cmdline /usr/bin/time git push --mirror
'
test_expect_success GPG 'push with post-receive to inspect certificate' '
Before this patch (bc22d845c4^) 'time' reports:
3.62user 0.03system 0:03.83elapsed 95%CPU (0avgtext+0avgdata 11904maxresident)k
0inputs+312outputs (0major+4597minor)pagefaults 0swaps
With this patch (bc22d845c4):
3.56user 0.04system 0:33.60elapsed 10%CPU (0avgtext+0avgdata 11832maxresident)k
0inputs+320outputs (0major+4578minor)pagefaults 0swaps
And the total runtime of the whole test script increases from 8-9s to
37-39s.
I wonder whether we should relax the fsync options for this test case.
>
> [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
> [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>
> Hi,
>
> here's my updated patch series which implements syncing of refs. It
> applies on top of Neeraj's v6 of "A design for future-proofing fsync()
> configuration".
>
> I've simplified these patches a bit:
>
> - I don't distinguishing between "loose" and "packed" refs anymore.
> I agree with Junio that it's probably not worth it, but we can
> still reintroduce the split at a later point without breaking
> backwards compatibility if the need comes up.
>
> - I've simplified the way loose refs are written to disk so that we
> now sync them when before we close their files. The previous
> implementation I had was broken because we tried to sync after
> closing.
>
> Because this really only changes a few lines of code I've also decided
> to squash together the patches into a single one.
>
> Patrick
>
> Documentation/config/core.txt | 1 +
> cache.h | 7 +++++--
> config.c | 1 +
> refs/files-backend.c | 1 +
> refs/packed-backend.c | 3 ++-
> 5 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 37105a7be4..812cca7de7 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -575,6 +575,7 @@ but risks losing recent work in the event of an unclean system shutdown.
> * `index` hardens the index when it is modified.
> * `objects` is an aggregate option that is equivalent to
> `loose-object,pack`.
> +* `reference` hardens references modified in the repo.
> * `derived-metadata` is an aggregate option that is equivalent to
> `pack-metadata,commit-graph`.
> * `committed` is an aggregate option that is currently equivalent to
> diff --git a/cache.h b/cache.h
> index cde0900d05..033e5b0779 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1005,6 +1005,7 @@ enum fsync_component {
> FSYNC_COMPONENT_PACK_METADATA = 1 << 2,
> FSYNC_COMPONENT_COMMIT_GRAPH = 1 << 3,
> FSYNC_COMPONENT_INDEX = 1 << 4,
> + FSYNC_COMPONENT_REFERENCE = 1 << 5,
> };
>
> #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \
> @@ -1017,7 +1018,8 @@ enum fsync_component {
> FSYNC_COMPONENTS_DERIVED_METADATA | \
> ~FSYNC_COMPONENT_LOOSE_OBJECT)
>
> -#define FSYNC_COMPONENTS_COMMITTED (FSYNC_COMPONENTS_OBJECTS)
> +#define FSYNC_COMPONENTS_COMMITTED (FSYNC_COMPONENTS_OBJECTS | \
> + FSYNC_COMPONENT_REFERENCE)
>
> #define FSYNC_COMPONENTS_ADDED (FSYNC_COMPONENTS_COMMITTED | \
> FSYNC_COMPONENT_INDEX)
> @@ -1026,7 +1028,8 @@ enum fsync_component {
> FSYNC_COMPONENT_PACK | \
> FSYNC_COMPONENT_PACK_METADATA | \
> FSYNC_COMPONENT_COMMIT_GRAPH | \
> - FSYNC_COMPONENT_INDEX)
> + FSYNC_COMPONENT_INDEX | \
> + FSYNC_COMPONENT_REFERENCE)
>
> /*
> * A bitmask indicating which components of the repo should be fsynced.
> diff --git a/config.c b/config.c
> index eb75f65338..3c9b6b589a 100644
> --- a/config.c
> +++ b/config.c
> @@ -1333,6 +1333,7 @@ static const struct fsync_component_name {
> { "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH },
> { "index", FSYNC_COMPONENT_INDEX },
> { "objects", FSYNC_COMPONENTS_OBJECTS },
> + { "reference", FSYNC_COMPONENT_REFERENCE },
> { "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA },
> { "committed", FSYNC_COMPONENTS_COMMITTED },
> { "added", FSYNC_COMPONENTS_ADDED },
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f59589d6cc..6521ee8af5 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1787,6 +1787,7 @@ static int write_ref_to_lockfile(struct ref_lock *lock,
> fd = get_lock_file_fd(&lock->lk);
> if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
> write_in_full(fd, &term, 1) < 0 ||
> + fsync_component(FSYNC_COMPONENT_REFERENCE, get_lock_file_fd(&lock->lk)) < 0 ||
> close_ref_gently(lock) < 0) {
> strbuf_addf(err,
> "couldn't write '%s'", get_lock_file_path(&lock->lk));
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 27dd8c3922..9d704ccd3e 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1262,7 +1262,8 @@ static int write_with_updates(struct packed_ref_store *refs,
> goto error;
> }
>
> - if (close_tempfile_gently(refs->tempfile)) {
> + if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
> + close_tempfile_gently(refs->tempfile)) {
> strbuf_addf(err, "error closing file %s: %s",
> get_tempfile_path(refs->tempfile),
> strerror(errno));
> --
> 2.35.1
>
prev parent reply other threads:[~2022-03-25 6:12 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-04 3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04 3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06 7:54 ` Neeraj Singh
2021-12-04 3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44 ` Patrick Steinhardt
2021-12-07 12:14 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29 ` Neeraj Singh
2021-12-07 12:18 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53 ` Patrick Steinhardt
2021-12-07 20:46 ` Neeraj Singh
2021-12-07 12:29 ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44 ` Neeraj Singh
2021-12-08 10:05 ` Ævar Arnfjörð Bjarmason
2021-12-09 0:14 ` Neeraj Singh
2021-12-09 0:44 ` Junio C Hamano
2021-12-09 4:08 ` Ævar Arnfjörð Bjarmason
2021-12-09 6:18 ` Neeraj Singh
2022-01-18 23:50 ` Neeraj Singh
2022-01-19 15:28 ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52 ` Ævar Arnfjörð Bjarmason
2022-01-28 1:28 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08 0:44 ` Neeraj Singh
2021-12-09 0:57 ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08 1:13 ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09 0:55 ` rsbecker
2022-01-10 19:00 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02 0:51 ` Junio C Hamano
2022-02-02 1:42 ` Junio C Hamano
2022-02-11 21:18 ` Neeraj Singh
2022-02-11 22:19 ` Junio C Hamano
2022-02-11 23:04 ` Neeraj Singh
2022-02-11 23:15 ` Junio C Hamano
2022-02-12 0:39 ` rsbecker
2022-02-14 7:04 ` Patrick Steinhardt
2022-02-14 17:17 ` Junio C Hamano
2022-03-09 13:42 ` Patrick Steinhardt
2022-03-09 18:50 ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03 ` Junio C Hamano
2022-03-10 12:33 ` Patrick Steinhardt
2022-03-10 17:15 ` Junio C Hamano
2022-03-09 20:05 ` Neeraj Singh
2022-02-11 20:38 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29 ` Junio C Hamano
2022-03-10 1:21 ` Neeraj Singh
2022-03-10 1:26 ` brian m. carlson
2022-03-10 1:56 ` Neeraj Singh
2022-03-09 23:03 ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10 0:21 ` Junio C Hamano
2022-03-10 2:53 ` Neeraj Singh
2022-03-10 7:19 ` Junio C Hamano
2022-03-10 18:38 ` Neeraj Singh
2022-03-10 18:44 ` Junio C Hamano
2022-03-10 19:57 ` Junio C Hamano
2022-03-10 20:25 ` Neeraj Singh
2022-03-10 21:17 ` Junio C Hamano
2022-03-10 13:11 ` Johannes Schindelin
2022-03-10 17:18 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10 9:53 ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10 9:53 ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34 ` Junio C Hamano
2022-03-10 18:40 ` Neeraj Singh
2022-03-10 9:53 ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25 ` Junio C Hamano
2022-03-10 19:03 ` Neeraj Singh
2022-03-10 22:54 ` Neeraj Singh
2022-03-11 6:40 ` Junio C Hamano
2022-03-11 9:15 ` Patrick Steinhardt
2022-03-11 9:36 ` Ævar Arnfjörð Bjarmason
2022-03-10 9:53 ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28 ` Junio C Hamano
2022-03-11 9:10 ` Patrick Steinhardt
2022-03-10 22:43 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06 ` Jiang Xin
2022-03-28 19:45 ` Neeraj Singh
2022-03-10 22:43 ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12 ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32 ` Junio C Hamano
2022-03-15 19:56 ` Neeraj Singh
2022-03-23 14:20 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24 ` Neeraj Singh
2022-03-26 0:24 ` Ævar Arnfjörð Bjarmason
2022-03-26 1:23 ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26 1:25 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31 ` Ævar Arnfjörð Bjarmason
2022-03-27 5:27 ` Neeraj Singh
2022-03-27 12:43 ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56 ` Patrick Steinhardt
2022-03-28 11:25 ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56 ` Neeraj Singh
2022-03-30 16:59 ` Neeraj Singh
2022-03-10 23:34 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11 0:03 ` Neeraj Singh
2022-03-11 18:50 ` Neeraj Singh
2022-03-13 23:50 ` Junio C Hamano
2022-03-11 9:58 ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25 6:11 ` SZEDER Gábor [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220325061149.GA2571@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=neerajsi@microsoft.com \
--cc=newren@gmail.com \
--cc=nksingh85@gmail.com \
--cc=ps@pks.im \
--cc=rsbecker@nexbridge.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).