From: "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: rsbecker@nexbridge.com, bagasdotme@gmail.com, newren@gmail.com,
avarab@gmail.com, nksingh85@gmail.com, ps@pks.im,
"Neeraj K. Singh" <neerajsi@microsoft.com>
Subject: [PATCH v4 0/4] A design for future-proofing fsync() configuration
Date: Tue, 01 Feb 2022 03:33:40 +0000 [thread overview]
Message-ID: <pull.1093.v4.git.1643686424.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1093.v3.git.1639011433.gitgitgadget@gmail.com>
This is an implementation of an extensible configuration mechanism for
fsyncing persistent components of a repo.
The main goals are to separate the "what" to sync from the "how". There are
now two settings: core.fsync - Control the 'what', including the index.
core.fsyncMethod - Control the 'how'. Currently we support writeout-only and
full fsync.
Syncing of refs can be layered on top of core.fsync. And batch mode will be
layered on core.fsyncMethod.
core.fsyncObjectfiles is removed and will issue a deprecation warning if
it's seen.
I'd like to get agreement on this direction before submitting batch mode to
the list. The batch mode series is available to view at
https://github.com/gitgitgadget/git/pull/1134
Please see [1], [2], and [3] for discussions that led to this series.
After this change, new persistent data files added to the repo will need to
be added to the fsync_component enum and documented in the
Documentation/config/core.txt text.
V4 changes:
* Rebase onto master at b23dac905bd.
* Add a comment to write_pack_file indicating why we don't fsync when
writing to stdout.
* I kept the configuration schema as-is rather than switching to
multi-value. The thinking here is that a stateless last-one-wins config
schema (comma separated) will make it easier to achieve some holistic
self-consistent fsync configuration for a particular repo.
V3 changes:
* Remove relative path from git-compat-util.h include [4].
* Updated newly added warning texts to have more context for localization
[4].
* Fixed tab spacing in enum fsync_action
* Moved the fsync looping out to a helper and do it consistently. [4]
* Changed commit description to use camelCase for config names. [5]
* Add an optional fourth patch with derived-metadata so that the user can
exclude a forward-compatible set of things that should be recomputable
given existing data.
V2 changes:
* Updated the documentation for core.fsyncmethod to be less certain.
writeout-only probably does not do the right thing on Linux.
* Split out the core.fsync=index change into its own commit.
* Rename REPO_COMPONENT to FSYNC_COMPONENT. This is really specific to
fsyncing, so the name should reflect that.
* Re-add missing Makefile change for SYNC_FILE_RANGE.
* Tested writeout-only mode, index syncing, and general config settings.
[1] https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
[2]
https://lore.kernel.org/git/dd65718814011eb93ccc4428f9882e0f025224a6.1636029491.git.ps@pks.im/
[3]
https://lore.kernel.org/git/pull.1076.git.git.1629856292.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/CANQDOdf8C4-haK9=Q_J4Cid8bQALnmGDm=SvatRbaVf+tkzqLw@mail.gmail.com/
[5] https://lore.kernel.org/git/211207.861r2opplg.gmgdl@evledraar.gmail.com/
Neeraj Singh (4):
core.fsyncmethod: add writeout-only mode
core.fsync: introduce granular fsync control
core.fsync: new option to harden the index
core.fsync: add a `derived-metadata` aggregate option
Documentation/config/core.txt | 35 ++++++++---
Makefile | 6 ++
builtin/fast-import.c | 2 +-
builtin/index-pack.c | 4 +-
builtin/pack-objects.c | 24 +++++---
bulk-checkin.c | 5 +-
cache.h | 49 +++++++++++++++-
commit-graph.c | 3 +-
compat/mingw.h | 3 +
compat/win32/flush.c | 28 +++++++++
config.c | 90 ++++++++++++++++++++++++++++-
config.mak.uname | 3 +
configure.ac | 8 +++
contrib/buildsystems/CMakeLists.txt | 3 +-
csum-file.c | 5 +-
csum-file.h | 3 +-
environment.c | 3 +-
git-compat-util.h | 24 ++++++++
midx.c | 3 +-
object-file.c | 3 +-
pack-bitmap-write.c | 3 +-
pack-write.c | 13 +++--
read-cache.c | 19 ++++--
wrapper.c | 64 ++++++++++++++++++++
write-or-die.c | 11 ++--
25 files changed, 367 insertions(+), 47 deletions(-)
create mode 100644 compat/win32/flush.c
base-commit: b23dac905bde28da47543484320db16312c87551
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1093%2Fneerajsi-msft%2Fns%2Fcore-fsync-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1093/neerajsi-msft/ns/core-fsync-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1093
Range-diff vs v3:
1: 15edfe51509 ! 1: 51a218d100d core.fsyncmethod: add writeout-only mode
@@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
endif
## cache.h ##
-@@ cache.h: extern int read_replace_refs;
- extern char *git_replace_ref_base;
+@@ cache.h: extern char *git_replace_ref_base;
extern int fsync_object_files;
+ extern int use_fsync;
+
+enum fsync_method {
+ FSYNC_METHOD_FSYNC,
@@ compat/win32/flush.c (new)
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
-+ DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
++ DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NTAPI, NtFlushBuffersFileEx,
+ HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+ PIO_STATUS_BLOCK IoStatusBlock);
+
@@ contrib/buildsystems/CMakeLists.txt: if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
compat/nedmalloc/nedmalloc.c compat/strdup.c)
## environment.c ##
-@@ environment.c: const char *git_attributes_file;
- const char *git_hooks_path;
- int zlib_compression_level = Z_BEST_SPEED;
+@@ environment.c: int zlib_compression_level = Z_BEST_SPEED;
int pack_compression_level = Z_DEFAULT_COMPRESSION;
--int fsync_object_files;
+ int fsync_object_files;
+ int use_fsync = -1;
+enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT;
size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
@@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
int err;
## write-or-die.c ##
-@@ write-or-die.c: void fprintf_or_die(FILE *f, const char *fmt, ...)
-
- void fsync_or_die(int fd, const char *msg)
- {
+@@ write-or-die.c: void fsync_or_die(int fd, const char *msg)
+ use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
+ if (!use_fsync)
+ return;
- while (fsync(fd) < 0) {
- if (errno != EINTR)
- die_errno("fsync error on '%s'", msg);
- }
++
+ if (fsync_method == FSYNC_METHOD_WRITEOUT_ONLY &&
+ git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0)
+ return;
2: 080be1a6f64 ! 2: 7a164ba9571 core.fsync: introduce granular fsync control
@@ builtin/index-pack.c: static void final(const char *final_pack_name, const char
## builtin/pack-objects.c ##
@@ builtin/pack-objects.c: static void write_pack_file(void)
- * If so, rewrite it like in fast-import
- */
+ display_progress(progress_state, written);
+ }
+
+- /*
+- * Did we write the wrong # entries in the header?
+- * If so, rewrite it like in fast-import
+- */
if (pack_to_stdout) {
- finalize_hashfile(f, hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
++ /*
++ * We never fsync when writing to stdout since we may
++ * not be writing to an actual pack file. For instance,
++ * the upload-pack code passes a pipe here. Calling
++ * fsync on a pipe results in unnecessary
++ * synchronization with the reader on some platforms.
++ */
+ finalize_hashfile(f, hash, FSYNC_COMPONENT_NONE,
+ CSUM_HASH_IN_STREAM | CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
@@ builtin/pack-objects.c: static void write_pack_file(void)
+ CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
} else {
- int fd = finalize_hashfile(f, hash, 0);
++ /*
++ * If we wrote the wrong number of entries in the
++ * header, rewrite it like in fast-import.
++ */
++
+ int fd = finalize_hashfile(f, hash, FSYNC_COMPONENT_PACK, 0);
fixup_pack_header_footer(fd, hash, pack_tmp_name,
nr_written, hash, offset);
@@ cache.h: void reset_shared_repository(void);
extern char *git_replace_ref_base;
-extern int fsync_object_files;
+-extern int use_fsync;
+/*
+ * These values are used to help identify parts of a repository to fsync.
+ * FSYNC_COMPONENT_NONE identifies data that will not be a persistent part of the
+ * repository and so shouldn't be fsynced.
+ */
+enum fsync_component {
-+ FSYNC_COMPONENT_NONE = 0,
++ FSYNC_COMPONENT_NONE,
+ FSYNC_COMPONENT_LOOSE_OBJECT = 1 << 0,
+ FSYNC_COMPONENT_PACK = 1 << 1,
+ FSYNC_COMPONENT_PACK_METADATA = 1 << 2,
@@ cache.h: void reset_shared_repository(void);
enum fsync_method {
FSYNC_METHOD_FSYNC,
+@@ cache.h: enum fsync_method {
+ };
+
+ extern enum fsync_method fsync_method;
++extern int use_fsync;
+ extern int core_preload_index;
+ extern int precomposed_unicode;
+ extern int protect_hfs;
@@ cache.h: int copy_file_with_time(const char *dst, const char *src, int mode);
void write_or_die(int fd, const void *buf, size_t count);
void fsync_or_die(int fd, const char *);
@@ csum-file.h: int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint
void crc32_begin(struct hashfile *);
## environment.c ##
-@@ environment.c: const char *git_hooks_path;
+@@ environment.c: const char *git_attributes_file;
+ const char *git_hooks_path;
int zlib_compression_level = Z_BEST_SPEED;
int pack_compression_level = Z_DEFAULT_COMPRESSION;
+-int fsync_object_files;
+ int use_fsync = -1;
enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT;
+enum fsync_component fsync_components = FSYNC_COMPONENTS_DEFAULT;
size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
@@ midx.c: static int write_midx_internal(const char *object_dir,
## object-file.c ##
@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
- /* Finalize a file on disk, and close it. */
static void close_loose_object(int fd)
{
-- if (fsync_object_files)
-- fsync_or_die(fd, "loose object file");
-+ fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file");
+ if (!the_repository->objects->odb->will_destroy) {
+- if (fsync_object_files)
+- fsync_or_die(fd, "loose object file");
++ fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file");
+ }
+
if (close(fd) != 0)
- die_errno(_("error when closing loose object file"));
- }
## pack-bitmap-write.c ##
@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
3: 2207950beba = 3: f217dba77a1 core.fsync: new option to harden the index
4: a830d177d4c = 4: 5c22a41c1f3 core.fsync: add a `derived-metadata` aggregate option
--
gitgitgadget
next prev parent reply other threads:[~2022-02-01 3:33 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-04 3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04 3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06 7:54 ` Neeraj Singh
2021-12-04 3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44 ` Patrick Steinhardt
2021-12-07 12:14 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29 ` Neeraj Singh
2021-12-07 12:18 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53 ` Patrick Steinhardt
2021-12-07 20:46 ` Neeraj Singh
2021-12-07 12:29 ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44 ` Neeraj Singh
2021-12-08 10:05 ` Ævar Arnfjörð Bjarmason
2021-12-09 0:14 ` Neeraj Singh
2021-12-09 0:44 ` Junio C Hamano
2021-12-09 4:08 ` Ævar Arnfjörð Bjarmason
2021-12-09 6:18 ` Neeraj Singh
2022-01-18 23:50 ` Neeraj Singh
2022-01-19 15:28 ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52 ` Ævar Arnfjörð Bjarmason
2022-01-28 1:28 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08 0:44 ` Neeraj Singh
2021-12-09 0:57 ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08 1:13 ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09 0:55 ` rsbecker
2022-01-10 19:00 ` Neeraj Singh
2022-02-01 3:33 ` Neeraj K. Singh via GitGitGadget [this message]
2022-02-01 3:33 ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02 0:51 ` Junio C Hamano
2022-02-02 1:42 ` Junio C Hamano
2022-02-11 21:18 ` Neeraj Singh
2022-02-11 22:19 ` Junio C Hamano
2022-02-11 23:04 ` Neeraj Singh
2022-02-11 23:15 ` Junio C Hamano
2022-02-12 0:39 ` rsbecker
2022-02-14 7:04 ` Patrick Steinhardt
2022-02-14 17:17 ` Junio C Hamano
2022-03-09 13:42 ` Patrick Steinhardt
2022-03-09 18:50 ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03 ` Junio C Hamano
2022-03-10 12:33 ` Patrick Steinhardt
2022-03-10 17:15 ` Junio C Hamano
2022-03-09 20:05 ` Neeraj Singh
2022-02-11 20:38 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29 ` Junio C Hamano
2022-03-10 1:21 ` Neeraj Singh
2022-03-10 1:26 ` brian m. carlson
2022-03-10 1:56 ` Neeraj Singh
2022-03-09 23:03 ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10 0:21 ` Junio C Hamano
2022-03-10 2:53 ` Neeraj Singh
2022-03-10 7:19 ` Junio C Hamano
2022-03-10 18:38 ` Neeraj Singh
2022-03-10 18:44 ` Junio C Hamano
2022-03-10 19:57 ` Junio C Hamano
2022-03-10 20:25 ` Neeraj Singh
2022-03-10 21:17 ` Junio C Hamano
2022-03-10 13:11 ` Johannes Schindelin
2022-03-10 17:18 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10 9:53 ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10 9:53 ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34 ` Junio C Hamano
2022-03-10 18:40 ` Neeraj Singh
2022-03-10 9:53 ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25 ` Junio C Hamano
2022-03-10 19:03 ` Neeraj Singh
2022-03-10 22:54 ` Neeraj Singh
2022-03-11 6:40 ` Junio C Hamano
2022-03-11 9:15 ` Patrick Steinhardt
2022-03-11 9:36 ` Ævar Arnfjörð Bjarmason
2022-03-10 9:53 ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28 ` Junio C Hamano
2022-03-11 9:10 ` Patrick Steinhardt
2022-03-10 22:43 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06 ` Jiang Xin
2022-03-28 19:45 ` Neeraj Singh
2022-03-10 22:43 ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12 ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32 ` Junio C Hamano
2022-03-15 19:56 ` Neeraj Singh
2022-03-23 14:20 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24 ` Neeraj Singh
2022-03-26 0:24 ` Ævar Arnfjörð Bjarmason
2022-03-26 1:23 ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26 1:25 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31 ` Ævar Arnfjörð Bjarmason
2022-03-27 5:27 ` Neeraj Singh
2022-03-27 12:43 ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56 ` Patrick Steinhardt
2022-03-28 11:25 ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56 ` Neeraj Singh
2022-03-30 16:59 ` Neeraj Singh
2022-03-10 23:34 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11 0:03 ` Neeraj Singh
2022-03-11 18:50 ` Neeraj Singh
2022-03-13 23:50 ` Junio C Hamano
2022-03-11 9:58 ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25 6:11 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1093.v4.git.1643686424.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=neerajsi@microsoft.com \
--cc=newren@gmail.com \
--cc=nksingh85@gmail.com \
--cc=ps@pks.im \
--cc=rsbecker@nexbridge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).