From: "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: rsbecker@nexbridge.com, bagasdotme@gmail.com, newren@gmail.com,
avarab@gmail.com, nksingh85@gmail.com, ps@pks.im,
"Neeraj K. Singh" <neerajsi@microsoft.com>
Subject: [PATCH v4 0/4] A design for future-proofing fsync() configuration
Date: Tue, 01 Feb 2022 03:33:40 +0000 [thread overview]
Message-ID: <pull.1093.v4.git.1643686424.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1093.v3.git.1639011433.gitgitgadget@gmail.com>
This is an implementation of an extensible configuration mechanism for
fsyncing persistent components of a repo.
The main goals are to separate the "what" to sync from the "how". There are
now two settings: core.fsync - Control the 'what', including the index.
core.fsyncMethod - Control the 'how'. Currently we support writeout-only and
full fsync.
Syncing of refs can be layered on top of core.fsync. And batch mode will be
layered on core.fsyncMethod.
core.fsyncObjectfiles is removed and will issue a deprecation warning if
it's seen.
I'd like to get agreement on this direction before submitting batch mode to
the list. The batch mode series is available to view at
https://github.com/gitgitgadget/git/pull/1134
Please see [1], [2], and [3] for discussions that led to this series.
After this change, new persistent data files added to the repo will need to
be added to the fsync_component enum and documented in the
Documentation/config/core.txt text.
V4 changes:
* Rebase onto master at b23dac905bd.
* Add a comment to write_pack_file indicating why we don't fsync when
writing to stdout.
* I kept the configuration schema as-is rather than switching to
multi-value. The thinking here is that a stateless last-one-wins config
schema (comma separated) will make it easier to achieve some holistic
self-consistent fsync configuration for a particular repo.
V3 changes:
* Remove relative path from git-compat-util.h include [4].
* Updated newly added warning texts to have more context for localization
[4].
* Fixed tab spacing in enum fsync_action
* Moved the fsync looping out to a helper and do it consistently. [4]
* Changed commit description to use camelCase for config names. [5]
* Add an optional fourth patch with derived-metadata so that the user can
exclude a forward-compatible set of things that should be recomputable
given existing data.
V2 changes:
* Updated the documentation for core.fsyncmethod to be less certain.
writeout-only probably does not do the right thing on Linux.
* Split out the core.fsync=index change into its own commit.
* Rename REPO_COMPONENT to FSYNC_COMPONENT. This is really specific to
fsyncing, so the name should reflect that.
* Re-add missing Makefile change for SYNC_FILE_RANGE.
* Tested writeout-only mode, index syncing, and general config settings.
[1] https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
[2]
https://lore.kernel.org/git/dd65718814011eb93ccc4428f9882e0f025224a6.1636029491.git.ps@pks.im/
[3]
https://lore.kernel.org/git/pull.1076.git.git.1629856292.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/CANQDOdf8C4-haK9=Q_J4Cid8bQALnmGDm=SvatRbaVf+tkzqLw@mail.gmail.com/
[5] https://lore.kernel.org/git/211207.861r2opplg.gmgdl@evledraar.gmail.com/
Neeraj Singh (4):
core.fsyncmethod: add writeout-only mode
core.fsync: introduce granular fsync control
core.fsync: new option to harden the index
core.fsync: add a `derived-metadata` aggregate option
Documentation/config/core.txt | 35 ++++++++---
Makefile | 6 ++
builtin/fast-import.c | 2 +-
builtin/index-pack.c | 4 +-
builtin/pack-objects.c | 24 +++++---
bulk-checkin.c | 5 +-
cache.h | 49 +++++++++++++++-
commit-graph.c | 3 +-
compat/mingw.h | 3 +
compat/win32/flush.c | 28 +++++++++
config.c | 90 ++++++++++++++++++++++++++++-
config.mak.uname | 3 +
configure.ac | 8 +++
contrib/buildsystems/CMakeLists.txt | 3 +-
csum-file.c | 5 +-
csum-file.h | 3 +-
environment.c | 3 +-
git-compat-util.h | 24 ++++++++
midx.c | 3 +-
object-file.c | 3 +-
pack-bitmap-write.c | 3 +-
pack-write.c | 13 +++--
read-cache.c | 19 ++++--
wrapper.c | 64 ++++++++++++++++++++
write-or-die.c | 11 ++--
25 files changed, 367 insertions(+), 47 deletions(-)
create mode 100644 compat/win32/flush.c
base-commit: b23dac905bde28da47543484320db16312c87551
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1093%2Fneerajsi-msft%2Fns%2Fcore-fsync-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1093/neerajsi-msft/ns/core-fsync-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1093
Range-diff vs v3:
1: 15edfe51509 ! 1: 51a218d100d core.fsyncmethod: add writeout-only mode
@@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
endif
## cache.h ##
-@@ cache.h: extern int read_replace_refs;
- extern char *git_replace_ref_base;
+@@ cache.h: extern char *git_replace_ref_base;
extern int fsync_object_files;
+ extern int use_fsync;
+
+enum fsync_method {
+ FSYNC_METHOD_FSYNC,
@@ compat/win32/flush.c (new)
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
-+ DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
++ DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NTAPI, NtFlushBuffersFileEx,
+ HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+ PIO_STATUS_BLOCK IoStatusBlock);
+
@@ contrib/buildsystems/CMakeLists.txt: if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
compat/nedmalloc/nedmalloc.c compat/strdup.c)
## environment.c ##
-@@ environment.c: const char *git_attributes_file;
- const char *git_hooks_path;
- int zlib_compression_level = Z_BEST_SPEED;
+@@ environment.c: int zlib_compression_level = Z_BEST_SPEED;
int pack_compression_level = Z_DEFAULT_COMPRESSION;
--int fsync_object_files;
+ int fsync_object_files;
+ int use_fsync = -1;
+enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT;
size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
@@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
int err;
## write-or-die.c ##
-@@ write-or-die.c: void fprintf_or_die(FILE *f, const char *fmt, ...)
-
- void fsync_or_die(int fd, const char *msg)
- {
+@@ write-or-die.c: void fsync_or_die(int fd, const char *msg)
+ use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
+ if (!use_fsync)
+ return;
- while (fsync(fd) < 0) {
- if (errno != EINTR)
- die_errno("fsync error on '%s'", msg);
- }
++
+ if (fsync_method == FSYNC_METHOD_WRITEOUT_ONLY &&
+ git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0)
+ return;
2: 080be1a6f64 ! 2: 7a164ba9571 core.fsync: introduce granular fsync control
@@ builtin/index-pack.c: static void final(const char *final_pack_name, const char
## builtin/pack-objects.c ##
@@ builtin/pack-objects.c: static void write_pack_file(void)
- * If so, rewrite it like in fast-import
- */
+ display_progress(progress_state, written);
+ }
+
+- /*
+- * Did we write the wrong # entries in the header?
+- * If so, rewrite it like in fast-import
+- */
if (pack_to_stdout) {
- finalize_hashfile(f, hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
++ /*
++ * We never fsync when writing to stdout since we may
++ * not be writing to an actual pack file. For instance,
++ * the upload-pack code passes a pipe here. Calling
++ * fsync on a pipe results in unnecessary
++ * synchronization with the reader on some platforms.
++ */
+ finalize_hashfile(f, hash, FSYNC_COMPONENT_NONE,
+ CSUM_HASH_IN_STREAM | CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
@@ builtin/pack-objects.c: static void write_pack_file(void)
+ CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
} else {
- int fd = finalize_hashfile(f, hash, 0);
++ /*
++ * If we wrote the wrong number of entries in the
++ * header, rewrite it like in fast-import.
++ */
++
+ int fd = finalize_hashfile(f, hash, FSYNC_COMPONENT_PACK, 0);
fixup_pack_header_footer(fd, hash, pack_tmp_name,
nr_written, hash, offset);
@@ cache.h: void reset_shared_repository(void);
extern char *git_replace_ref_base;
-extern int fsync_object_files;
+-extern int use_fsync;
+/*
+ * These values are used to help identify parts of a repository to fsync.
+ * FSYNC_COMPONENT_NONE identifies data that will not be a persistent part of the
+ * repository and so shouldn't be fsynced.
+ */
+enum fsync_component {
-+ FSYNC_COMPONENT_NONE = 0,
++ FSYNC_COMPONENT_NONE,
+ FSYNC_COMPONENT_LOOSE_OBJECT = 1 << 0,
+ FSYNC_COMPONENT_PACK = 1 << 1,
+ FSYNC_COMPONENT_PACK_METADATA = 1 << 2,
@@ cache.h: void reset_shared_repository(void);
enum fsync_method {
FSYNC_METHOD_FSYNC,
+@@ cache.h: enum fsync_method {
+ };
+
+ extern enum fsync_method fsync_method;
++extern int use_fsync;
+ extern int core_preload_index;
+ extern int precomposed_unicode;
+ extern int protect_hfs;
@@ cache.h: int copy_file_with_time(const char *dst, const char *src, int mode);
void write_or_die(int fd, const void *buf, size_t count);
void fsync_or_die(int fd, const char *);
@@ csum-file.h: int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint
void crc32_begin(struct hashfile *);
## environment.c ##
-@@ environment.c: const char *git_hooks_path;
+@@ environment.c: const char *git_attributes_file;
+ const char *git_hooks_path;
int zlib_compression_level = Z_BEST_SPEED;
int pack_compression_level = Z_DEFAULT_COMPRESSION;
+-int fsync_object_files;
+ int use_fsync = -1;
enum fsync_method fsync_method = FSYNC_METHOD_DEFAULT;
+enum fsync_component fsync_components = FSYNC_COMPONENTS_DEFAULT;
size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
@@ midx.c: static int write_midx_internal(const char *object_dir,
## object-file.c ##
@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
- /* Finalize a file on disk, and close it. */
static void close_loose_object(int fd)
{
-- if (fsync_object_files)
-- fsync_or_die(fd, "loose object file");
-+ fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file");
+ if (!the_repository->objects->odb->will_destroy) {
+- if (fsync_object_files)
+- fsync_or_die(fd, "loose object file");
++ fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file");
+ }
+
if (close(fd) != 0)
- die_errno(_("error when closing loose object file"));
- }
## pack-bitmap-write.c ##
@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
3: 2207950beba = 3: f217dba77a1 core.fsync: new option to harden the index
4: a830d177d4c = 4: 5c22a41c1f3 core.fsync: add a `derived-metadata` aggregate option
--
gitgitgadget
next prev parent reply other threads:[~2022-02-01 3:33 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-04 3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04 3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06 7:54 ` Neeraj Singh
2021-12-04 3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07 2:46 ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44 ` Patrick Steinhardt
2021-12-07 12:14 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29 ` Neeraj Singh
2021-12-07 12:18 ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53 ` Patrick Steinhardt
2021-12-07 20:46 ` Neeraj Singh
2021-12-07 12:29 ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44 ` Neeraj Singh
2021-12-08 10:05 ` Ævar Arnfjörð Bjarmason
2021-12-09 0:14 ` Neeraj Singh
2021-12-09 0:44 ` Junio C Hamano
2021-12-09 4:08 ` Ævar Arnfjörð Bjarmason
2021-12-09 6:18 ` Neeraj Singh
2022-01-18 23:50 ` Neeraj Singh
2022-01-19 15:28 ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52 ` Ævar Arnfjörð Bjarmason
2022-01-28 1:28 ` Neeraj Singh
2021-12-07 2:46 ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08 0:44 ` Neeraj Singh
2021-12-09 0:57 ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09 0:57 ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08 1:13 ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09 0:55 ` rsbecker
2022-01-10 19:00 ` Neeraj Singh
2022-02-01 3:33 ` Neeraj K. Singh via GitGitGadget [this message]
2022-02-01 3:33 ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02 0:51 ` Junio C Hamano
2022-02-02 1:42 ` Junio C Hamano
2022-02-11 21:18 ` Neeraj Singh
2022-02-11 22:19 ` Junio C Hamano
2022-02-11 23:04 ` Neeraj Singh
2022-02-11 23:15 ` Junio C Hamano
2022-02-12 0:39 ` rsbecker
2022-02-14 7:04 ` Patrick Steinhardt
2022-02-14 17:17 ` Junio C Hamano
2022-03-09 13:42 ` Patrick Steinhardt
2022-03-09 18:50 ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03 ` Junio C Hamano
2022-03-10 12:33 ` Patrick Steinhardt
2022-03-10 17:15 ` Junio C Hamano
2022-03-09 20:05 ` Neeraj Singh
2022-02-11 20:38 ` Neeraj Singh
2022-02-01 3:33 ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01 3:33 ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29 ` Junio C Hamano
2022-03-10 1:21 ` Neeraj Singh
2022-03-10 1:26 ` brian m. carlson
2022-03-10 1:56 ` Neeraj Singh
2022-03-09 23:03 ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10 0:21 ` Junio C Hamano
2022-03-10 2:53 ` Neeraj Singh
2022-03-10 7:19 ` Junio C Hamano
2022-03-10 18:38 ` Neeraj Singh
2022-03-10 18:44 ` Junio C Hamano
2022-03-10 19:57 ` Junio C Hamano
2022-03-10 20:25 ` Neeraj Singh
2022-03-10 21:17 ` Junio C Hamano
2022-03-10 13:11 ` Johannes Schindelin
2022-03-10 17:18 ` Junio C Hamano
2022-03-09 23:03 ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03 ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10 9:53 ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10 9:53 ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34 ` Junio C Hamano
2022-03-10 18:40 ` Neeraj Singh
2022-03-10 9:53 ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25 ` Junio C Hamano
2022-03-10 19:03 ` Neeraj Singh
2022-03-10 22:54 ` Neeraj Singh
2022-03-11 6:40 ` Junio C Hamano
2022-03-11 9:15 ` Patrick Steinhardt
2022-03-11 9:36 ` Ævar Arnfjörð Bjarmason
2022-03-10 9:53 ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28 ` Junio C Hamano
2022-03-11 9:10 ` Patrick Steinhardt
2022-03-10 22:43 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06 ` Jiang Xin
2022-03-28 19:45 ` Neeraj Singh
2022-03-10 22:43 ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43 ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12 ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32 ` Junio C Hamano
2022-03-15 19:56 ` Neeraj Singh
2022-03-23 14:20 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Ævar Arnfjörð Bjarmason
2022-03-25 21:24 ` Neeraj Singh
2022-03-26 0:24 ` Ævar Arnfjörð Bjarmason
2022-03-26 1:23 ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26 1:25 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31 ` Ævar Arnfjörð Bjarmason
2022-03-27 5:27 ` Neeraj Singh
2022-03-27 12:43 ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56 ` Patrick Steinhardt
2022-03-28 11:25 ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56 ` Neeraj Singh
2022-03-30 16:59 ` Neeraj Singh
2022-03-10 23:34 ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11 0:03 ` Neeraj Singh
2022-03-11 18:50 ` Neeraj Singh
2022-03-13 23:50 ` Junio C Hamano
2022-03-11 9:58 ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25 6:11 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1093.v4.git.1643686424.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=neerajsi@microsoft.com \
--cc=newren@gmail.com \
--cc=nksingh85@gmail.com \
--cc=ps@pks.im \
--cc=rsbecker@nexbridge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.