git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, rsbecker@nexbridge.com,
	bagasdotme@gmail.com, newren@gmail.com, avarab@gmail.com,
	nksingh85@gmail.com, sandals@crustytoothpaste.net,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2] core.fsync: new option to harden references
Date: Fri, 25 Mar 2022 07:11:49 +0100	[thread overview]
Message-ID: <20220325061149.GA2571@szeder.dev> (raw)
In-Reply-To: <47dd79106b93bb81750320d50ccaa74c24aacd28.1646992380.git.ps@pks.im>

On Fri, Mar 11, 2022 at 10:58:59AM +0100, Patrick Steinhardt wrote:
> When writing both loose and packed references to disk we first create a
> lockfile, write the updated values into that lockfile, and on commit we
> rename the file into place. According to filesystem developers, this
> behaviour is broken because applications should always sync data to disk
> before doing the final rename to ensure data consistency [1][2][3]. If
> applications fail to do this correctly, a hard crash of the machine can
> easily result in corrupted on-disk data.
> 
> This kind of corruption can in fact be easily observed with Git when the
> machine hard-resets shortly after writing references to disk. On
> machines with ext4, this will likely lead to the "empty files" problem:
> the file has been renamed, but its data has not been synced to disk. The
> result is that the reference is corrupt, and in the worst case this can
> lead to data loss.
> 
> Implement a new option to harden references so that users and admins can
> avoid this scenario by syncing locked loose and packed references to
> disk before we rename them into place.

In 't5541-http-push-smart.sh' there is a test case called 'push 2000
tags over http', which does pretty much what it's title says.  This
patch makes that test case significantly slower.


diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index 8ca50f8b18..d7e94cb791 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -415,7 +415,7 @@ test_expect_success CMDLINE_LIMIT 'push 2000 tags over http' '
 	  sort |
 	  sed "s|.*|$sha1 refs/tags/really-long-tag-name-&|" \
 	  >.git/packed-refs &&
-	run_with_limited_cmdline git push --mirror
+	run_with_limited_cmdline /usr/bin/time git push --mirror
 '
 
 test_expect_success GPG 'push with post-receive to inspect certificate' '

Before this patch (bc22d845c4^) 'time' reports:

  3.62user 0.03system 0:03.83elapsed 95%CPU (0avgtext+0avgdata 11904maxresident)k
  0inputs+312outputs (0major+4597minor)pagefaults 0swaps

With this patch (bc22d845c4):

  3.56user 0.04system 0:33.60elapsed 10%CPU (0avgtext+0avgdata 11832maxresident)k
  0inputs+320outputs (0major+4578minor)pagefaults 0swaps

And the total runtime of the whole test script increases from 8-9s to
37-39s.


I wonder whether we should relax the fsync options for this test case.

> 
> [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
> [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> 
> Hi,
> 
> here's my updated patch series which implements syncing of refs. It
> applies on top of Neeraj's v6 of "A design for future-proofing fsync()
> configuration".
> 
> I've simplified these patches a bit:
> 
>     - I don't distinguishing between "loose" and "packed" refs anymore.
>       I agree with Junio that it's probably not worth it, but we can
>       still reintroduce the split at a later point without breaking
>       backwards compatibility if the need comes up.
> 
>     - I've simplified the way loose refs are written to disk so that we
>       now sync them when before we close their files. The previous
>       implementation I had was broken because we tried to sync after
>       closing.
> 
> Because this really only changes a few lines of code I've also decided
> to squash together the patches into a single one.
> 
> Patrick
> 
>  Documentation/config/core.txt | 1 +
>  cache.h                       | 7 +++++--
>  config.c                      | 1 +
>  refs/files-backend.c          | 1 +
>  refs/packed-backend.c         | 3 ++-
>  5 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 37105a7be4..812cca7de7 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -575,6 +575,7 @@ but risks losing recent work in the event of an unclean system shutdown.
>  * `index` hardens the index when it is modified.
>  * `objects` is an aggregate option that is equivalent to
>    `loose-object,pack`.
> +* `reference` hardens references modified in the repo.
>  * `derived-metadata` is an aggregate option that is equivalent to
>    `pack-metadata,commit-graph`.
>  * `committed` is an aggregate option that is currently equivalent to
> diff --git a/cache.h b/cache.h
> index cde0900d05..033e5b0779 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1005,6 +1005,7 @@ enum fsync_component {
>  	FSYNC_COMPONENT_PACK_METADATA		= 1 << 2,
>  	FSYNC_COMPONENT_COMMIT_GRAPH		= 1 << 3,
>  	FSYNC_COMPONENT_INDEX			= 1 << 4,
> +	FSYNC_COMPONENT_REFERENCE		= 1 << 5,
>  };
>  
>  #define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \
> @@ -1017,7 +1018,8 @@ enum fsync_component {
>  				  FSYNC_COMPONENTS_DERIVED_METADATA | \
>  				  ~FSYNC_COMPONENT_LOOSE_OBJECT)
>  
> -#define FSYNC_COMPONENTS_COMMITTED (FSYNC_COMPONENTS_OBJECTS)
> +#define FSYNC_COMPONENTS_COMMITTED (FSYNC_COMPONENTS_OBJECTS | \
> +				    FSYNC_COMPONENT_REFERENCE)
>  
>  #define FSYNC_COMPONENTS_ADDED (FSYNC_COMPONENTS_COMMITTED | \
>  				FSYNC_COMPONENT_INDEX)
> @@ -1026,7 +1028,8 @@ enum fsync_component {
>  			      FSYNC_COMPONENT_PACK | \
>  			      FSYNC_COMPONENT_PACK_METADATA | \
>  			      FSYNC_COMPONENT_COMMIT_GRAPH | \
> -			      FSYNC_COMPONENT_INDEX)
> +			      FSYNC_COMPONENT_INDEX | \
> +			      FSYNC_COMPONENT_REFERENCE)
>  
>  /*
>   * A bitmask indicating which components of the repo should be fsynced.
> diff --git a/config.c b/config.c
> index eb75f65338..3c9b6b589a 100644
> --- a/config.c
> +++ b/config.c
> @@ -1333,6 +1333,7 @@ static const struct fsync_component_name {
>  	{ "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH },
>  	{ "index", FSYNC_COMPONENT_INDEX },
>  	{ "objects", FSYNC_COMPONENTS_OBJECTS },
> +	{ "reference", FSYNC_COMPONENT_REFERENCE },
>  	{ "derived-metadata", FSYNC_COMPONENTS_DERIVED_METADATA },
>  	{ "committed", FSYNC_COMPONENTS_COMMITTED },
>  	{ "added", FSYNC_COMPONENTS_ADDED },
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f59589d6cc..6521ee8af5 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1787,6 +1787,7 @@ static int write_ref_to_lockfile(struct ref_lock *lock,
>  	fd = get_lock_file_fd(&lock->lk);
>  	if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
>  	    write_in_full(fd, &term, 1) < 0 ||
> +	    fsync_component(FSYNC_COMPONENT_REFERENCE, get_lock_file_fd(&lock->lk)) < 0 ||
>  	    close_ref_gently(lock) < 0) {
>  		strbuf_addf(err,
>  			    "couldn't write '%s'", get_lock_file_path(&lock->lk));
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 27dd8c3922..9d704ccd3e 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1262,7 +1262,8 @@ static int write_with_updates(struct packed_ref_store *refs,
>  		goto error;
>  	}
>  
> -	if (close_tempfile_gently(refs->tempfile)) {
> +	if (fsync_component(FSYNC_COMPONENT_REFERENCE, get_tempfile_fd(refs->tempfile)) ||
> +	    close_tempfile_gently(refs->tempfile)) {
>  		strbuf_addf(err, "error closing file %s: %s",
>  			    get_tempfile_path(refs->tempfile),
>  			    strerror(errno));
> -- 
> 2.35.1
> 



      reply	other threads:[~2022-03-25  6:12 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-04  3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04  3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06  7:54   ` Neeraj Singh
2021-12-04  3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07  2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07  2:46   ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44     ` Patrick Steinhardt
2021-12-07 12:14       ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29       ` Neeraj Singh
2021-12-07 12:18     ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58       ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53     ` Patrick Steinhardt
2021-12-07 20:46       ` Neeraj Singh
2021-12-07 12:29     ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44       ` Neeraj Singh
2021-12-08 10:05         ` Ævar Arnfjörð Bjarmason
2021-12-09  0:14           ` Neeraj Singh
2021-12-09  0:44             ` Junio C Hamano
2021-12-09  4:08             ` Ævar Arnfjörð Bjarmason
2021-12-09  6:18               ` Neeraj Singh
2022-01-18 23:50                 ` Neeraj Singh
2022-01-19 15:28                   ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52                 ` Ævar Arnfjörð Bjarmason
2022-01-28  1:28                   ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56   ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08  0:44     ` Neeraj Singh
2021-12-09  0:57   ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08  1:13     ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09  0:55       ` rsbecker
2022-01-10 19:00         ` Neeraj Singh
2022-02-01  3:33     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02  0:51         ` Junio C Hamano
2022-02-02  1:42           ` Junio C Hamano
2022-02-11 21:18             ` Neeraj Singh
2022-02-11 22:19               ` Junio C Hamano
2022-02-11 23:04                 ` Neeraj Singh
2022-02-11 23:15                   ` Junio C Hamano
2022-02-12  0:39                     ` rsbecker
2022-02-14  7:04                     ` Patrick Steinhardt
2022-02-14 17:17                       ` Junio C Hamano
2022-03-09 13:42                         ` Patrick Steinhardt
2022-03-09 18:50                           ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03                           ` Junio C Hamano
2022-03-10 12:33                             ` Patrick Steinhardt
2022-03-10 17:15                               ` Junio C Hamano
2022-03-09 20:05                           ` Neeraj Singh
2022-02-11 20:38           ` Neeraj Singh
2022-02-01  3:33       ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03       ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29           ` Junio C Hamano
2022-03-10  1:21             ` Neeraj Singh
2022-03-10  1:26           ` brian m. carlson
2022-03-10  1:56             ` Neeraj Singh
2022-03-09 23:03         ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48           ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10  0:21           ` Junio C Hamano
2022-03-10  2:53             ` Neeraj Singh
2022-03-10  7:19               ` Junio C Hamano
2022-03-10 18:38                 ` Neeraj Singh
2022-03-10 18:44                   ` Junio C Hamano
2022-03-10 19:57                     ` Junio C Hamano
2022-03-10 20:25                       ` Neeraj Singh
2022-03-10 21:17                         ` Junio C Hamano
2022-03-10 13:11               ` Johannes Schindelin
2022-03-10 17:18               ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10  9:53         ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10  9:53         ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34           ` Junio C Hamano
2022-03-10 18:40             ` Neeraj Singh
2022-03-10  9:53         ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25           ` Junio C Hamano
2022-03-10 19:03             ` Neeraj Singh
2022-03-10 22:54           ` Neeraj Singh
2022-03-11  6:40           ` Junio C Hamano
2022-03-11  9:15             ` Patrick Steinhardt
2022-03-11  9:36               ` Ævar Arnfjörð Bjarmason
2022-03-10  9:53         ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28           ` Junio C Hamano
2022-03-11  9:10             ` Patrick Steinhardt
2022-03-10 22:43         ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06             ` Jiang Xin
2022-03-28 19:45               ` Neeraj Singh
2022-03-10 22:43           ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12             ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32               ` Junio C Hamano
2022-03-15 19:56                 ` Neeraj Singh
2022-03-23 14:20               ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24                 ` Neeraj Singh
2022-03-26  0:24                   ` Ævar Arnfjörð Bjarmason
2022-03-26  1:23                     ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26  1:25                     ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31                       ` Ævar Arnfjörð Bjarmason
2022-03-27  5:27                         ` Neeraj Singh
2022-03-27 12:43                           ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56                             ` Patrick Steinhardt
2022-03-28 11:25                               ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56                                 ` Neeraj Singh
2022-03-30 16:59                                   ` Neeraj Singh
2022-03-10 23:34           ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11  0:03             ` Neeraj Singh
2022-03-11 18:50               ` Neeraj Singh
2022-03-13 23:50             ` Junio C Hamano
2022-03-11  9:58           ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25  6:11             ` SZEDER Gábor [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220325061149.GA2571@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=neerajsi@microsoft.com \
    --cc=newren@gmail.com \
    --cc=nksingh85@gmail.com \
    --cc=ps@pks.im \
    --cc=rsbecker@nexbridge.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).