What's cooking in git.git (Jan 2024, #01; Tue, 2)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* What's cooking in git.git (Jan 2024, #01; Tue, 2)
@ 2024-01-03  1:02 Junio C Hamano
  2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2024-01-03  1:02 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive.  A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).

As the past couple of weeks have been slow, topics that have been
merged to 'next' (and to a lessor extent, to 'master') this season
may have seen less scrutinizing eyes than they deserve, which might
lead bugs and regressions caused by them discovered later, but that
is a regular part of life.  Let's see what happens.

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* jc/orphan-unborn (2023-11-24) 2 commits
  (merged to 'next' on 2023-12-21 at 030300487a)
 + orphan/unborn: fix use of 'orphan' in end-user facing messages
 + orphan/unborn: add to the glossary and use them consistently

 Doc updates to clarify what an "unborn branch" means.
 source: <xmqq4jhb977x.fsf@gitster.g>

* jc/retire-cas-opt-name-constant (2023-12-19) 1 commit
  (merged to 'next' on 2023-12-21 at 39ef057c8b)
 + remote.h: retire CAS_OPT_NAME

 Code clean-up.
 source: <xmqq5y0uc7tq.fsf@gitster.g>

* la/trailer-cleanups (2023-12-20) 3 commits
  (merged to 'next' on 2023-12-21 at e26ede5f55)
 + trailer: use offsets for trailer_start/trailer_end
 + trailer: find the end of the log message
 + commit: ignore_non_trailer computes number of bytes to ignore

 Code clean-up.
 source: <pull.1563.v5.git.1697828495.gitgitgadget@gmail.com>

* ps/pseudo-refs (2023-12-14) 4 commits
  (merged to 'next' on 2023-12-21 at 3460e3d667)
 + bisect: consistently write BISECT_EXPECTED_REV via the refdb
 + refs: complete list of special refs
 + refs: propagate errno when reading special refs fails
 + wt-status: read HEAD and ORIG_HEAD via the refdb

 Assorted changes around pseudoref handling.
 source: <cover.1702560829.git.ps@pks.im>

* rj/status-bisect-while-rebase (2023-10-16) 1 commit
  (merged to 'next' on 2023-12-21 at 929a027fb7)
 + status: fix branch shown when not only bisecting

 "git status" is taught to show both the branch being bisected and
 being rebased when both are in effect at the same time.
 cf. <xmqqil76kyov.fsf@gitster.g>
 source: <2e24ca9b-9c5f-f4df-b9f8-6574a714dfb2@gmail.com>

* rs/rebase-use-strvec-pushf (2023-12-20) 1 commit
  (merged to 'next' on 2023-12-20 at ecb190973c)
 + rebase: use strvec_pushf() for format-patch revisions

 Code clean-up.
 source: <4ab7431c-6c1b-448c-b4d2-e8b9be0e4eef@web.de>

* sh/completion-with-reftable (2023-12-19) 2 commits
  (merged to 'next' on 2023-12-20 at 7957d4aa5b)
 + completion: support pseudoref existence checks for reftables
 + completion: refactor existence checks for pseudorefs

 Command line completion script (in contrib/) learned to work better
 with the reftable backend.
 source: <cover.1703022850.git.stanhu@gmail.com>

--------------------------------------------------
[New Topics]

* cp/sideband-array-index-comment-fix (2023-12-28) 1 commit
 - sideband.c: remove redundant 'NEEDSWORK' tag

 In-code comment fix.
 source: <pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com>

* ps/worktree-refdb-initialization (2023-12-28) 6 commits
 - builtin/worktree: create refdb via ref backend
 - worktree: expose interface to look up worktree by name
 - builtin/worktree: move setup of commondir file earlier
 - refs/files: skip creation of "refs/{heads,tags}" for worktrees
 - setup: move creation of "refs/" into the files backend
 - refs: prepare `refs_init_db()` for initializing worktree refs

 Instead of manually creating refs/ hierarchy on disk upon a
 creation of a secondary worktree, which is only usable via the
 files backend, use the refs API to populate it.

 May want to wait until other topics solidify a bit more.
 cf. <xmqqedf6gpt8.fsf@gitster.g>
 source: <cover.1703754513.git.ps@pks.im>

--------------------------------------------------
[Cooking]

* ml/doc-merge-updates (2023-12-20) 2 commits
  (merged to 'next' on 2023-12-28 at 7a6329a219)
 + Documentation/git-merge.txt: use backticks for command wrapping
 + Documentation/git-merge.txt: fix reference to synopsis

 Doc update.

 Will merge to 'master'.
 source: <20231220195342.17590-1-mi.al.lohmann@gmail.com>

* cp/apply-core-filemode (2023-12-26) 3 commits
 - apply: code simplification
 - apply: correctly reverse patch's pre- and post-image mode bits
 - apply: ignore working tree filemode when !core.filemode

 "git apply" on a filesystem without filemode support have learned
 to take a hint from what is in the index for the path, even when
 not working with the "--index" or "--cached" option, when checking
 the executable bit match what is required by the preimage in the
 patch.

 Needs review.
 source: <20231226233218.472054-1-gitster@pobox.com>

* jc/archive-list-with-extra-args (2023-12-21) 1 commit
  (merged to 'next' on 2023-12-28 at 2d5c20e67f)
 + archive: "--list" does not take further options

 "git archive --list extra garbage" silently ignored excess command
 line parameters, which has been corrected.

 Will merge to 'master'.
 source: <xmqqmsu3mnix.fsf@gitster.g>

* jk/t1006-cat-file-objectsize-disk (2023-12-21) 1 commit
  (merged to 'next' on 2023-12-28 at d82812e636)
 + t1006: add tests for %(objectsize:disk)

 Test update.

 Will merge to 'master'.
 source: <20231221094722.GA570888@coredump.intra.peff.net>

* js/contributor-docs-updates (2023-12-27) 9 commits
  (merged to 'next' on 2024-01-02 at 0e072117cd)
 + SubmittingPatches: hyphenate non-ASCII
 + SubmittingPatches: clarify GitHub artifact format
 + SubmittingPatches: clarify GitHub visual
 + SubmittingPatches: provide tag naming advice
 + SubmittingPatches: update extra tags list
 + SubmittingPatches: discourage new trailers
 + SubmittingPatches: drop ref to "What's in git.git"
 + CodingGuidelines: write punctuation marks
 + CodingGuidelines: move period inside parentheses

 Doc update.

 Will merge to 'master'.
 source: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

* al/unit-test-ctype (2024-01-02) 1 commit
 - unit-tests: rewrite t/helper/test-ctype.c as a unit test

 Move test-ctype helper to the unit-test framework.
 source: <20240101104017.9452-2-ach.lumap@gmail.com>

* bk/bisect-doc-fix (2023-12-27) 1 commit
 - doc: use singular form of repeatable path arg

 Synopsis fix.

 Expecting a reroll.
 source: <3d46bca1-96d4-43ba-a912-1f7c76942287@smtp-relay.sendinblue.com>

* en/sparse-checkout-eoo (2023-12-26) 2 commits
  (merged to 'next' on 2023-12-28 at 3ddf2ebab6)
 + sparse-checkout: be consistent with end of options markers
 + Merge branch 'jk/end-of-options' into jc/sparse-checkout-set-add-end-of-options

 "git sparse-checkout (add|set) --[no-]cone --end-of-options" did
 not handle "--end-of-options" correctly after a recent update.

 Will merge to 'master'.
 source: <pull.1625.v2.git.git.1703619562639.gitgitgadget@gmail.com>

* ja/doc-placeholders-fix (2023-12-26) 2 commits
 - doc: enforce placeholders in documentation
 - doc: enforce dashes in placeholders

 Docfix.

 Needs review.
 source: <pull.1626.git.1703539287.gitgitgadget@gmail.com>

* jc/sparse-checkout-set-default-fix (2023-12-26) 1 commit
  (merged to 'next' on 2023-12-28 at a558eccf8e)
 + sparse-checkout: use default patterns for 'set' only !stdin

 "git sparse-checkout set" added default patterns even when the
 patterns are being fed from the standard input, which has been
 corrected.

 Will merge to 'master'.
 source: <20231221065925.3234048-3-gitster@pobox.com>

* rs/fast-import-simplify-mempool-allocation (2023-12-26) 1 commit
  (merged to 'next' on 2023-12-28 at 16e6dd2a69)
 + fast-import: use mem_pool_calloc()

 Code simplification.

 Will merge to 'master'.
 source: <50c1f410-ca37-4c1c-a28b-3e9fad49f2b4@web.de>

* rs/mem-pool-improvements (2023-12-28) 2 commits
  (merged to 'next' on 2023-12-28 at aa03d9441c)
 + mem-pool: simplify alignment calculation
 + mem-pool: fix big allocations

 MemPool allocator fixes.

 Will merge to 'master'.
 source: <fa89d269-1a23-4ed6-bebc-30c0b629f444@web.de>

* ps/refstorage-extension (2024-01-02) 13 commits
 - t9500: write "extensions.refstorage" into config
 - builtin/clone: introduce `--ref-format=` value flag
 - builtin/init: introduce `--ref-format=` value flag
 - builtin/rev-parse: introduce `--show-ref-format` flag
 - t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
 - setup: introduce GIT_DEFAULT_REF_FORMAT envvar
 - setup: introduce "extensions.refStorage" extension
 - setup: set repository's formats on init
 - setup: start tracking ref storage format
 - refs: refactor logic to look up storage backends
 - worktree: skip reading HEAD when repairing worktrees
 - t: introduce DEFAULT_REPO_FORMAT prereq
 - Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension

 Introduce a new extension "refstorage" so that we can mark a
 repository that uses a non-default ref backend, like reftable.

 Will merge to 'next'?
 source: <cover.1703833818.git.ps@pks.im>

* ps/reftable-fixes-and-optims (2023-12-28) 9 commits
 - reftable/merged: transfer ownership of records when iterating
 - reftable/merged: really reuse buffers to compute record keys
 - reftable/record: store "val2" hashes as static arrays
 - reftable/record: store "val1" hashes as static arrays
 - reftable/record: constify some parts of the interface
 - reftable/writer: fix index corruption when writing multiple indices
 - reftable/stack: do not auto-compact twice in `reftable_stack_add()`
 - reftable/stack: do not overwrite errors when compacting
 - Merge branch 'ps/reftable-fixes' into ps/reftable-fixes-and-optims

 More fixes and optimizations to the reftable backend.

 Expecting a (hopefully minor and final) reroll.
 cf. <xmqqtto2gswa.fsf@gitster.g>
 source: <cover.1703743174.git.ps@pks.im>

* tb/multi-pack-verbatim-reuse (2023-12-14) 26 commits
 - t/perf: add performance tests for multi-pack reuse
 - pack-bitmap: enable reuse from all bitmapped packs
 - pack-objects: allow setting `pack.allowPackReuse` to "single"
 - t/test-lib-functions.sh: implement `test_trace2_data` helper
 - pack-objects: add tracing for various packfile metrics
 - pack-bitmap: prepare to mark objects from multiple packs for reuse
 - pack-revindex: implement `midx_pair_to_pack_pos()`
 - pack-revindex: factor out `midx_key_to_pack_pos()` helper
 - midx: implement `midx_preferred_pack()`
 - git-compat-util.h: implement checked size_t to uint32_t conversion
 - pack-objects: include number of packs reused in output
 - pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
 - pack-objects: prepare `write_reused_pack()` for multi-pack reuse
 - pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
 - pack-objects: keep track of `pack_start` for each reuse pack
 - pack-objects: parameterize pack-reuse routines over a single pack
 - pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
 - pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
 - ewah: implement `bitmap_is_empty()`
 - pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
 - midx: implement `midx_locate_pack()`
 - midx: implement `BTMP` chunk
 - midx: factor out `fill_pack_info()`
 - pack-bitmap: plug leak in find_objects()
 - pack-bitmap-write: deep-clear the `bb_commit` slab
 - pack-objects: free packing_data in more places

 Streaming spans of packfile data used to be done only from a
 single, primary, pack in a repository with multiple packfiles.  It
 has been extended to allow reuse from other packfiles, too.

 Will merge to 'next'?
 cf. <ZXurD1NTZ4TAs7WZ@nand.local>
 source: <cover.1702592603.git.me@ttaylorr.com>

* jc/bisect-doc (2023-12-09) 1 commit
 - bisect: document "terms" subcommand more fully

 Doc update.

 Needs review.
 source: <xmqqzfyjmk02.fsf@gitster.g>

* en/header-cleanup (2023-12-26) 12 commits
  (merged to 'next' on 2023-12-28 at 1ccddc2a10)
 + treewide: remove unnecessary includes in source files
 + treewide: add direct includes currently only pulled in transitively
 + trace2/tr2_tls.h: remove unnecessary include
 + submodule-config.h: remove unnecessary include
 + pkt-line.h: remove unnecessary include
 + line-log.h: remove unnecessary include
 + http.h: remove unnecessary include
 + fsmonitor--daemon.h: remove unnecessary includes
 + blame.h: remove unnecessary includes
 + archive.h: remove unnecessary include
 + treewide: remove unnecessary includes in source files
 + treewide: remove unnecessary includes from header files

 Remove unused header "#include".

 Will merge to 'master'.
 source: <pull.1617.v2.git.1703351700.gitgitgadget@gmail.com>

* jw/builtin-objectmode-attr (2023-12-28) 1 commit
  (merged to 'next' on 2024-01-02 at 4c3784b3a1)
 + attr: add builtin objectmode values support

 The builtin_objectmode attribute is populated for each path
 without adding anything in .gitattributes files, which would be
 useful in magic pathspec, e.g., ":(attr:builtin_objectmode=100755)"
 to limit to executables.

 Will merge to 'master'.
 cf. <xmqq5y0ssknj.fsf@gitster.g>
 source: <20231116054437.2343549-1-jojwang@google.com>

* tb/merge-tree-write-pack (2023-10-23) 5 commits
 - builtin/merge-tree.c: implement support for `--write-pack`
 - bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
 - bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
 - bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
 - bulk-checkin: extract abstract `bulk_checkin_source`

 "git merge-tree" learned "--write-pack" to record its result
 without creating loose objects.

 Broken when an object created during a merge is needed to continue merge
 cf. <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>
 source: <cover.1698101088.git.me@ttaylorr.com>

* tb/pair-chunk-expect (2023-11-10) 8 commits
 - midx: read `OOFF` chunk with `pair_chunk_expect()`
 - midx: read `OIDL` chunk with `pair_chunk_expect()`
 - commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
 - commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
 - commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
 - commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
 - chunk-format: introduce `pair_chunk_expect()` helper
 - Merge branch 'jk/chunk-bounds-more' into HEAD

 Further code clean-up.

 Needs review.
 source: <cover.1699569246.git.me@ttaylorr.com>

* tb/path-filter-fix (2023-10-18) 17 commits
 - bloom: introduce `deinit_bloom_filters()`
 - commit-graph: reuse existing Bloom filters where possible
 - object.h: fix mis-aligned flag bits table
 - commit-graph: drop unnecessary `graph_read_bloom_data_context`
 - commit-graph.c: unconditionally load Bloom filters
 - bloom: prepare to discard incompatible Bloom filters
 - bloom: annotate filters with hash version
 - commit-graph: new filter ver. that fixes murmur3
 - repo-settings: introduce commitgraph.changedPathsVersion
 - t4216: test changed path filters with high bit paths
 - t/helper/test-read-graph: implement `bloom-filters` mode
 - bloom.h: make `load_bloom_filter_from_graph()` public
 - t/helper/test-read-graph.c: extract `dump_graph_info()`
 - gitformat-commit-graph: describe version 2 of BDAT
 - commit-graph: ensure Bloom filters are read with consistent settings
 - revision.c: consult Bloom filters for root commits
 - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`

 The Bloom filter used for path limited history traversal was broken
 on systems whose "char" is unsigned; update the implementation and
 bump the format version to 2.

 Expecting a reroll.
 cf. <20231023202212.GA5470@szeder.dev>
 source: <cover.1697653929.git.me@ttaylorr.com>

* ak/color-decorate-symbols (2023-10-23) 7 commits
 - log: add color.decorate.pseudoref config variable
 - refs: exempt pseudorefs from pattern prefixing
 - refs: add pseudorefs array and iteration functions
 - log: add color.decorate.ref config variable
 - log: add color.decorate.symbol config variable
 - log: use designated inits for decoration_colors
 - config: restructure color.decorate documentation

 A new config for coloring.

 Needs review.
 source: <20231023221143.72489-1-andy.koppe@gmail.com>

* eb/hash-transition (2023-10-02) 30 commits
 - t1016-compatObjectFormat: add tests to verify the conversion between objects
 - t1006: test oid compatibility with cat-file
 - t1006: rename sha1 to oid
 - test-lib: compute the compatibility hash so tests may use it
 - builtin/ls-tree: let the oid determine the output algorithm
 - object-file: handle compat objects in check_object_signature
 - tree-walk: init_tree_desc take an oid to get the hash algorithm
 - builtin/cat-file: let the oid determine the output algorithm
 - rev-parse: add an --output-object-format parameter
 - repository: implement extensions.compatObjectFormat
 - object-file: update object_info_extended to reencode objects
 - object-file-convert: convert commits that embed signed tags
 - object-file-convert: convert commit objects when writing
 - object-file-convert: don't leak when converting tag objects
 - object-file-convert: convert tag objects when writing
 - object-file-convert: add a function to convert trees between algorithms
 - object: factor out parse_mode out of fast-import and tree-walk into in object.h
 - cache: add a function to read an OID of a specific algorithm
 - tag: sign both hashes
 - commit: export add_header_signature to support handling signatures on tags
 - commit: convert mergetag before computing the signature of a commit
 - commit: write commits for both hashes
 - object-file: add a compat_oid_in parameter to write_object_file_flags
 - object-file: update the loose object map when writing loose objects
 - loose: compatibilty short name support
 - loose: add a mapping between SHA-1 and SHA-256 for loose objects
 - repository: add a compatibility hash algorithm
 - object-names: support input of oids in any supported hash
 - oid-array: teach oid-array to handle multiple kinds of oids
 - object-file-convert: stubs for converting from one object format to another

 Teach a repository to work with both SHA-1 and SHA-256 hash algorithms.

 Needs review.
 source: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>

* jx/remote-archive-over-smart-http (2023-12-14) 4 commits
 - archive: support remote archive from stateless transport
 - transport-helper: call do_take_over() in connect_helper
 - transport-helper: call do_take_over() in process_connect
 - transport-helper: no connection restriction in connect_helper

 "git archive --remote=<remote>" learned to talk over the smart
 http (aka stateless) transport.

 Needs review.
 source: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>

* jx/sideband-chomp-newline-fix (2023-12-18) 3 commits
 - pkt-line: do not chomp newlines for sideband messages
 - pkt-line: memorize sideband fragment in reader
 - test-pkt-line: add option parser for unpack-sideband

 Sideband demultiplexer fixes.

 Will merge to 'next'?
 source: <cover.1702823801.git.zhiyou.jx@alibaba-inc.com>

* jc/rerere-cleanup (2023-08-25) 4 commits
 - rerere: modernize use of empty strbuf
 - rerere: try_merge() should use LL_MERGE_ERROR when it means an error
 - rerere: fix comment on handle_file() helper
 - rerere: simplify check_one_conflict() helper function

 Code clean-up.

 Not ready to be reviewed yet.
 source: <20230824205456.1231371-1-gitster@pobox.com>

--------------------------------------------------
[Discarded]

* jc/sparse-checkout-eoo (2023-12-21) 5 commits
 . sparse-checkout: tighten add_patterns_from_input()
 . sparse-checkout: use default patterns for 'set' only !stdin
 . SQUASH??? end-of-options test
 . sparse-checkout: take care of "--end-of-options" in set/add/check-rules
 + Merge branch 'jk/end-of-options' into jc/sparse-checkout-set-add-end-of-options

 "git sparse-checkout (add|set) --[no-]cone --end-of-options" did
 not handle "--end-of-options" correctly after a recent update.

 Superseded by the en/sparse-checkout-eoo topic.
 source: <20231221065925.3234048-1-gitster@pobox.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2))
  2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
@ 2024-01-03  5:53 ` Patrick Steinhardt
  2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
  2024-01-03 16:43 ` Taylor Blau
  2 siblings, 0 replies; 20+ messages in thread
From: Patrick Steinhardt @ 2024-01-03  5:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:
[snip]
> * ps/refstorage-extension (2024-01-02) 13 commits
>  - t9500: write "extensions.refstorage" into config
>  - builtin/clone: introduce `--ref-format=` value flag
>  - builtin/init: introduce `--ref-format=` value flag
>  - builtin/rev-parse: introduce `--show-ref-format` flag
>  - t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
>  - setup: introduce GIT_DEFAULT_REF_FORMAT envvar
>  - setup: introduce "extensions.refStorage" extension
>  - setup: set repository's formats on init
>  - setup: start tracking ref storage format
>  - refs: refactor logic to look up storage backends
>  - worktree: skip reading HEAD when repairing worktrees
>  - t: introduce DEFAULT_REPO_FORMAT prereq
>  - Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
> 
>  Introduce a new extension "refstorage" so that we can mark a
>  repository that uses a non-default ref backend, like reftable.
> 
>  Will merge to 'next'?
>  source: <cover.1703833818.git.ps@pks.im>

It's ready from my point of view, but I'm happy to wait a few days for
people to come back from holidays.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
  2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
@ 2024-01-03  9:01 ` Jeff King
  2024-01-03 16:37   ` Junio C Hamano
  2024-01-03 17:14   ` René Scharfe
  2024-01-03 16:43 ` Taylor Blau
  2 siblings, 2 replies; 20+ messages in thread
From: Jeff King @ 2024-01-03  9:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git

On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:

> * jk/t1006-cat-file-objectsize-disk (2023-12-21) 1 commit
>   (merged to 'next' on 2023-12-28 at d82812e636)
>  + t1006: add tests for %(objectsize:disk)
> 
>  Test update.
> 
>  Will merge to 'master'.
>  source: <20231221094722.GA570888@coredump.intra.peff.net>

It looks like this is the original version. I posted a v2 that took
René's suggestion to swap out the awk for shell, but it got overlooked.
I'm happy enough either way, but if we want to salvage that effort,
here's a patch which could go on top:

-- >8 --
From: René Scharfe <l.s.r@web.de>
Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes

To compute the expected on-disk size of packed objects, we sort the
output of show-index by pack offset and then compute the difference
between adjacent entries using awk. This works but has a few readability
problems:

  1. Reading the index in pack order means don't find out the size of an
     oid's entry until we see the _next_ entry. So we have to save it to
     print later.

     We can instead iterate in reverse order, so we compute each oid's
     size as we see it.

  2. Since the awk invocation is inside a text_expect block, we can't
     easily use single-quotes to hold the script. So we use
     double-quotes, but then have to escape the dollar signs in the awk
     script.

     We can swap this out for a shell loop instead (which is made much
     easier by the first change).

Signed-off-by: Jeff King <peff@peff.net>
---
I gave René authorship since this was his cleverness, but obviously I
wrote the commit message. Giving an explicit signoff would be nice,
though.

 t/t1006-cat-file.sh | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 0c2eafae65..5ea3326128 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
 		while read idx
 		do
 			git show-index <"$idx" >idx.raw &&
-			sort -n <idx.raw >idx.sorted &&
+			sort -nr <idx.raw >idx.sorted &&
 			packsz=$(test_file_size "${idx%.idx}.pack") &&
 			end=$((packsz - rawsz)) &&
-			awk -v end="$end" "
-			  NR > 1 { print oid, \$1 - start }
-			  { start = \$1; oid = \$2 }
-			  END { print oid, end - start }
-			" idx.sorted ||
+			while read start oid rest
+			do
+				size=$((end - start)) &&
+				end=$start &&
+				echo "$oid $size" ||
+				return 1
+			done <idx.sorted ||
 			return 1
 		done
 	} >expect.raw &&
-- 
2.43.0.514.g7147b80757


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
@ 2024-01-03 16:37   ` Junio C Hamano
  2024-01-05  8:59     ` Jeff King
  2024-01-03 17:14   ` René Scharfe
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2024-01-03 16:37 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, git

Jeff King <peff@peff.net> writes:

> It looks like this is the original version. I posted a v2 that took
> René's suggestion to swap out the awk for shell, but it got overlooked.
> I'm happy enough either way, but if we want to salvage that effort,
> here's a patch which could go on top:

Thanks.  I was happy enough with the old one and placed the updated
one on backburner.

A commit message that explains why this incremental update (i.e.,
rewrite from awk to a shell loop) is a good idea below does make it
worthwhile ;-)

> -- >8 --
> From: René Scharfe <l.s.r@web.de>
> Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes
>
> To compute the expected on-disk size of packed objects, we sort the
> output of show-index by pack offset and then compute the difference
> between adjacent entries using awk. This works but has a few readability
> problems:
>
>   1. Reading the index in pack order means don't find out the size of an
>      oid's entry until we see the _next_ entry. So we have to save it to
>      print later.
>
>      We can instead iterate in reverse order, so we compute each oid's
>      size as we see it.

If you go forward, you need "the end of the previous round" (which
is "the beginning of the current round") to be subtracted from "the
end of the current round".  If you go forward, you have to have "the
beginning of the previous round" (which is "the end of the current
round") from which you subtract "the beginning of the current round".

So from that point of view, the only difference is that you would
not be ready to emit in the first round, and you would need to emit
for the last entry after the loop.  Because we happen to have the
end of the last entry outside the loop, we can omit the awkwardness.

OK.  But iterating over a list backwards is a bit awkward ;-).

>   2. Since the awk invocation is inside a text_expect block, we can't
>      easily use single-quotes to hold the script. So we use
>      double-quotes, but then have to escape the dollar signs in the awk
>      script.

Yup.  The joy of shell quoting rules ;-)

> I gave René authorship since this was his cleverness, but obviously I
> wrote the commit message. Giving an explicit signoff would be nice,
> though.

Indeed.

>  t/t1006-cat-file.sh | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 0c2eafae65..5ea3326128 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
>  		while read idx
>  		do
>  			git show-index <"$idx" >idx.raw &&
> -			sort -n <idx.raw >idx.sorted &&
> +			sort -nr <idx.raw >idx.sorted &&
>  			packsz=$(test_file_size "${idx%.idx}.pack") &&
>  			end=$((packsz - rawsz)) &&
> -			awk -v end="$end" "
> -			  NR > 1 { print oid, \$1 - start }
> -			  { start = \$1; oid = \$2 }
> -			  END { print oid, end - start }
> -			" idx.sorted ||
> +			while read start oid rest
> +			do
> +				size=$((end - start)) &&
> +				end=$start &&
> +				echo "$oid $size" ||
> +				return 1
> +			done <idx.sorted ||
>  			return 1
>  		done
>  	} >expect.raw &&

This is totally unrelated tangent, but the way "show-index" gets
invoked in the above loop makes readers wonder how the caller found
out which $idx file to read.

Of course, the above loop sits downstream of a pipe

    find .git/objects/pack -type f -name \*.idx

which means that any user of "git show-index" must be intimately
familiar with how the object database is structured.  I wonder if we
want an extra layer of abstraction, similar to how the reference
database can have different backend implementation.

Anyway, will queue.  Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
  2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
  2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
@ 2024-01-03 16:43 ` Taylor Blau
  2024-01-03 18:08   ` Junio C Hamano
  2 siblings, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-03 16:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:
> * tb/merge-tree-write-pack (2023-10-23) 5 commits
>  - builtin/merge-tree.c: implement support for `--write-pack`
>  - bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
>  - bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
>  - bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
>  - bulk-checkin: extract abstract `bulk_checkin_source`
>
>  "git merge-tree" learned "--write-pack" to record its result
>  without creating loose objects.
>
>  Broken when an object created during a merge is needed to continue merge
>  cf. <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>
>  source: <cover.1698101088.git.me@ttaylorr.com>

Let's drop this one.

> * tb/pair-chunk-expect (2023-11-10) 8 commits
>  - midx: read `OOFF` chunk with `pair_chunk_expect()`
>  - midx: read `OIDL` chunk with `pair_chunk_expect()`
>  - commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
>  - commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
>  - commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
>  - commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
>  - chunk-format: introduce `pair_chunk_expect()` helper
>  - Merge branch 'jk/chunk-bounds-more' into HEAD
>
>  Further code clean-up.
>
>  Needs review.
>  source: <cover.1699569246.git.me@ttaylorr.com>

This one is on my list of things to look at, but probably not something
that I'll get to urgently before I've had a chance to clear my holiday
backlog. If you don't mind keeping it, that's fine, but I won't be upset
if it's easier to drop from 'seen' in the meantime.

> * tb/path-filter-fix (2023-10-18) 17 commits
>  - bloom: introduce `deinit_bloom_filters()`
>  - commit-graph: reuse existing Bloom filters where possible
>  - object.h: fix mis-aligned flag bits table
>  - commit-graph: drop unnecessary `graph_read_bloom_data_context`
>  - commit-graph.c: unconditionally load Bloom filters
>  - bloom: prepare to discard incompatible Bloom filters
>  - bloom: annotate filters with hash version
>  - commit-graph: new filter ver. that fixes murmur3
>  - repo-settings: introduce commitgraph.changedPathsVersion
>  - t4216: test changed path filters with high bit paths
>  - t/helper/test-read-graph: implement `bloom-filters` mode
>  - bloom.h: make `load_bloom_filter_from_graph()` public
>  - t/helper/test-read-graph.c: extract `dump_graph_info()`
>  - gitformat-commit-graph: describe version 2 of BDAT
>  - commit-graph: ensure Bloom filters are read with consistent settings
>  - revision.c: consult Bloom filters for root commits
>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>
>  The Bloom filter used for path limited history traversal was broken
>  on systems whose "char" is unsigned; update the implementation and
>  bump the format version to 2.
>
>  Expecting a reroll.
>  cf. <20231023202212.GA5470@szeder.dev>
>  source: <cover.1697653929.git.me@ttaylorr.com>

I was confused by this one, since I couldn't figure out which tests
Gábor was referring to here. I responded in [1], but haven't heard back
since the end of October.

I personally think that this is ready to go, and it would be nice to get
it out of the perpetual "cooking" state that it's in. So if Gábor is
around to reply and I'm indeed missing something, that would be great.
But in the meantime, I think that this is ready to go.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
  2024-01-03 16:37   ` Junio C Hamano
@ 2024-01-03 17:14   ` René Scharfe
  1 sibling, 0 replies; 20+ messages in thread
From: René Scharfe @ 2024-01-03 17:14 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: git

Am 03.01.24 um 10:01 schrieb Jeff King:
> On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:
>
>> * jk/t1006-cat-file-objectsize-disk (2023-12-21) 1 commit
>>   (merged to 'next' on 2023-12-28 at d82812e636)
>>  + t1006: add tests for %(objectsize:disk)
>>
>>  Test update.
>>
>>  Will merge to 'master'.
>>  source: <20231221094722.GA570888@coredump.intra.peff.net>
>
> It looks like this is the original version. I posted a v2 that took
> René's suggestion to swap out the awk for shell, but it got overlooked.
> I'm happy enough either way, but if we want to salvage that effort,
> here's a patch which could go on top:
>
> -- >8 --
> From: René Scharfe <l.s.r@web.de>
> Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes
>
> To compute the expected on-disk size of packed objects, we sort the
> output of show-index by pack offset and then compute the difference
> between adjacent entries using awk. This works but has a few readability
> problems:
>
>   1. Reading the index in pack order means don't find out the size of an
>      oid's entry until we see the _next_ entry. So we have to save it to
>      print later.
>
>      We can instead iterate in reverse order, so we compute each oid's
>      size as we see it.
>
>   2. Since the awk invocation is inside a text_expect block, we can't
>      easily use single-quotes to hold the script. So we use
>      double-quotes, but then have to escape the dollar signs in the awk
>      script.
>
>      We can swap this out for a shell loop instead (which is made much
>      easier by the first change).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I gave René authorship since this was his cleverness, but obviously I
> wrote the commit message. Giving an explicit signoff would be nice,
> though.

Alright, thank you!

Signed-off-by: René Scharfe <l.s.r@web.de>

>
>  t/t1006-cat-file.sh | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 0c2eafae65..5ea3326128 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
>  		while read idx
>  		do
>  			git show-index <"$idx" >idx.raw &&
> -			sort -n <idx.raw >idx.sorted &&
> +			sort -nr <idx.raw >idx.sorted &&
>  			packsz=$(test_file_size "${idx%.idx}.pack") &&
>  			end=$((packsz - rawsz)) &&
> -			awk -v end="$end" "
> -			  NR > 1 { print oid, \$1 - start }
> -			  { start = \$1; oid = \$2 }
> -			  END { print oid, end - start }
> -			" idx.sorted ||
> +			while read start oid rest
> +			do
> +				size=$((end - start)) &&
> +				end=$start &&
> +				echo "$oid $size" ||
> +				return 1
> +			done <idx.sorted ||
>  			return 1
>  		done
>  	} >expect.raw &&

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03 16:43 ` Taylor Blau
@ 2024-01-03 18:08   ` Junio C Hamano
  2024-01-13 18:35     ` SZEDER Gábor
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2024-01-03 18:08 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git

Taylor Blau <me@ttaylorr.com> writes:

>> * tb/path-filter-fix (2023-10-18) 17 commits
>>  - bloom: introduce `deinit_bloom_filters()`
>>  ...
>>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>>
>>  The Bloom filter used for path limited history traversal was broken
>>  on systems whose "char" is unsigned; update the implementation and
>>  bump the format version to 2.
>>
>>  Expecting a reroll.
>>  cf. <20231023202212.GA5470@szeder.dev>
>>  source: <cover.1697653929.git.me@ttaylorr.com>
>
> I was confused by this one, since I couldn't figure out which tests
> Gábor was referring to here. I responded in [1], but haven't heard back
> since the end of October.
> ...
> [1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/

OK, let's ping just once then.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03 16:37   ` Junio C Hamano
@ 2024-01-05  8:59     ` Jeff King
  2024-01-05 16:34       ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2024-01-05  8:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git

On Wed, Jan 03, 2024 at 08:37:59AM -0800, Junio C Hamano wrote:

> This is totally unrelated tangent, but the way "show-index" gets
> invoked in the above loop makes readers wonder how the caller found
> out which $idx file to read.
> 
> Of course, the above loop sits downstream of a pipe
> 
>     find .git/objects/pack -type f -name \*.idx
> 
> which means that any user of "git show-index" must be intimately
> familiar with how the object database is structured.  I wonder if we
> want an extra layer of abstraction, similar to how the reference
> database can have different backend implementation.

I assume you mean a test helper by "extra layer of abstraction".

That could make sense, though I think this is just the tip of the
iceberg. There are a bazillion tests that muck with .git/objects/
directly (especially ones finding and munging loose objects). So it
wouldn't do much good until we know what cases the abstract code would
have to handle. And I don't think we have any concrete alternative yet
to the current object-dir layout.

So I think we should just punt on it for now. Adding one case here is
not making a hypothetical abstraction layer significantly harder to add
later.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-05  8:59     ` Jeff King
@ 2024-01-05 16:34       ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2024-01-05 16:34 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, git

Jeff King <peff@peff.net> writes:

> On Wed, Jan 03, 2024 at 08:37:59AM -0800, Junio C Hamano wrote:
>
>> This is totally unrelated tangent, but the way "show-index" gets
>> invoked in the above loop makes readers wonder how the caller found
>> out which $idx file to read.
>> 
>> Of course, the above loop sits downstream of a pipe
>> 
>>     find .git/objects/pack -type f -name \*.idx
>> 
>> which means that any user of "git show-index" must be intimately
>> familiar with how the object database is structured.  I wonder if we
>> want an extra layer of abstraction, similar to how the reference
>> database can have different backend implementation.
>
> I assume you mean a test helper by "extra layer of abstraction".

Well, that might be a good first step, but I was envisioning more
into far future where the object store is reimplemented on top of
high-performance database, at which point we may want subcommands
like "git odb list-packs" that lists packfile names (which may not
name any on-filesystem files) that can be given to "git show-index",
for example.  Of course, the abstraction boundary of the object
store might be placed a lot higher and the interface into it may be
limited to "here is an oid, give me its contents" without distinction
between the objects in packfiles and objects in loose object files.

> So I think we should just punt on it for now. Adding one case here is
> not making a hypothetical abstraction layer significantly harder to add
> later.

Totally agree, and that is why I labeled the whole thing as an
unrelated tangent.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-03 18:08   ` Junio C Hamano
@ 2024-01-13 18:35     ` SZEDER Gábor
  2024-01-13 22:06       ` Taylor Blau
  2024-01-13 22:51       ` SZEDER Gábor
  0 siblings, 2 replies; 20+ messages in thread
From: SZEDER Gábor @ 2024-01-13 18:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Wed, Jan 03, 2024 at 10:08:18AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> * tb/path-filter-fix (2023-10-18) 17 commits
> >>  - bloom: introduce `deinit_bloom_filters()`
> >>  ...
> >>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> >>
> >>  The Bloom filter used for path limited history traversal was broken
> >>  on systems whose "char" is unsigned; update the implementation and
> >>  bump the format version to 2.
> >>
> >>  Expecting a reroll.
> >>  cf. <20231023202212.GA5470@szeder.dev>
> >>  source: <cover.1697653929.git.me@ttaylorr.com>
> >
> > I was confused by this one, since I couldn't figure out which tests
> > Gábor was referring to here. I responded in [1], but haven't heard back
> > since the end of October.
> > ...
> > [1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/

I keep referring to the test in:

  https://public-inbox.org/git/20230830200218.GA5147@szeder.dev/

which, rather disappointingly, is still the only test out there
exercising the interaction between split commit graphs and different
modified path Bloom filter versions.  Note that in that message I
mentioned that merging layers with differenet Bloom filter versions
seemed to work, but that, alas, is no longer the case, because it's
now broken in Taylor's recent iterations of the patch series.

At the risk of sounding like a broken record: the interaction of split
commit graphs and different Bloom filter versions should be thoroughly
tested.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-13 18:35     ` SZEDER Gábor
@ 2024-01-13 22:06       ` Taylor Blau
  2024-01-13 23:41         ` SZEDER Gábor
  2024-01-13 22:51       ` SZEDER Gábor
  1 sibling, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-13 22:06 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Hi Gábor,

Thanks very much for your patience while I figure all of this out. I'm
confident that with the fixup! below that your test passes in the next
round of this series.

On Sat, Jan 13, 2024 at 07:35:44PM +0100, SZEDER Gábor wrote:
> On Wed, Jan 03, 2024 at 10:08:18AM -0800, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > >> * tb/path-filter-fix (2023-10-18) 17 commits
> > >>  - bloom: introduce `deinit_bloom_filters()`
> > >>  ...
> > >>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> > >>
> > >>  The Bloom filter used for path limited history traversal was broken
> > >>  on systems whose "char" is unsigned; update the implementation and
> > >>  bump the format version to 2.
> > >>
> > >>  Expecting a reroll.
> > >>  cf. <20231023202212.GA5470@szeder.dev>
> > >>  source: <cover.1697653929.git.me@ttaylorr.com>
> > >
> > > I was confused by this one, since I couldn't figure out which tests
> > > Gábor was referring to here. I responded in [1], but haven't heard back
> > > since the end of October.
> > > ...
> > > [1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/
>
> I keep referring to the test in:
>
>   https://public-inbox.org/git/20230830200218.GA5147@szeder.dev/
>
> which, rather disappointingly, is still the only test out there
> exercising the interaction between split commit graphs and different
> modified path Bloom filter versions.  Note that in that message I
> mentioned that merging layers with differenet Bloom filter versions
> seemed to work, but that, alas, is no longer the case, because it's
> now broken in Taylor's recent iterations of the patch series.

Thanks for clarifying. With all of the different versions of this series
and the couple of tests that you and I were talking about, I think that
I got confused and thought you were referring to a different test.

Indeed, the current version of this series (v4) does not pass the test
in <20230830200218.GA5147@szeder.dev>, but the fix in order for it to
pass is relatively straightforward.

I have this on top of my local branch as a fixup! and I'll squash it
down on Tuesday[^1] and send a reroll after double-checking that the fix
is correct and doesn't conflict with any other parts of the series.

Since it's been so long since I've looked at this, I'll re-read the
series as a whole with fresh eyes to make sure that everything still
makes sense.

In any case, here's the patch on top (with a lightly modified version of
the test you wrote in <20230830200218.GA5147@szeder.dev>:

Subject: [PATCH] fixup! commit-graph: ensure Bloom filters are read with
 consistent settings

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c       |  3 ++-
 t/t4216-log-bloom.sh | 29 ++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 60fa64d956..82a4632c4e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -507,7 +507,8 @@ static void validate_mixed_bloom_settings(struct commit_graph *g)
 		}

 		if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
-		    g->bloom_filter_settings->num_hashes != settings->num_hashes) {
+		    g->bloom_filter_settings->num_hashes != settings->num_hashes ||
+		    g->bloom_filter_settings->hash_version != settings->hash_version) {
 			g->chunk_bloom_indexes = NULL;
 			g->chunk_bloom_data = NULL;
 			FREE_AND_NULL(g->bloom_filter_settings);
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 569f2b6f8b..d8c42e2e27 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -438,7 +438,7 @@ test_expect_success 'setup for mixed Bloom setting tests' '
 	done
 '

-test_expect_success 'ensure incompatible Bloom filters are ignored' '
+test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
 	# Compute Bloom filters with "unusual" settings.
 	git -C $repo rev-parse one >in &&
 	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
@@ -488,6 +488,33 @@ test_expect_success 'merge graph layers with incompatible Bloom settings' '
 	test_must_be_empty err
 '

+test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
+	rm "$repo/$graph" &&
+
+	git -C $repo log --oneline --no-decorate -- $CENT >expect &&
+
+	# Compute v1 Bloom filters for commits at the bottom.
+	git -C $repo rev-parse HEAD^ >in &&
+	git -C $repo commit-graph write --stdin-commits --changed-paths \
+		--split <in &&
+
+	# Compute v2 Bloomfilters for the rest of the commits at the top.
+	git -C $repo rev-parse HEAD >in &&
+	git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
+		--stdin-commits --changed-paths --split=no-merge <in &&
+
+	test_line_count = 2 $repo/$chain &&
+
+	git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
+	test_cmp expect actual &&
+
+	layer="$(head -n 1 $repo/$chain)" &&
+	cat >expect.err <<-EOF &&
+	warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
+	EOF
+	test_cmp expect.err err
+'
+
 get_first_changed_path_filter () {
 	test-tool read-graph bloom-filters >filters.dat &&
 	head -n 1 filters.dat

--
2.38.0.16.g393fd4c6db

(Excuse the old version of Git here, I'm in the middle of a long-running
process which checks out older versions of the tree to count the number
of unused-parameters over time.)

Thanks,
Taylor

[^1]: Monday is a US holiday, so I likely won't be at my computer.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-13 18:35     ` SZEDER Gábor
  2024-01-13 22:06       ` Taylor Blau
@ 2024-01-13 22:51       ` SZEDER Gábor
  2024-01-16 20:49         ` Taylor Blau
  1 sibling, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2024-01-13 22:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Sat, Jan 13, 2024 at 07:35:44PM +0100, SZEDER Gábor wrote:
> On Wed, Jan 03, 2024 at 10:08:18AM -0800, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> > 
> > >> * tb/path-filter-fix (2023-10-18) 17 commits
> > >>  - bloom: introduce `deinit_bloom_filters()`
> > >>  ...
> > >>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> > >>
> > >>  The Bloom filter used for path limited history traversal was broken
> > >>  on systems whose "char" is unsigned; update the implementation and
> > >>  bump the format version to 2.
> > >>
> > >>  Expecting a reroll.
> > >>  cf. <20231023202212.GA5470@szeder.dev>
> > >>  source: <cover.1697653929.git.me@ttaylorr.com>
> > >
> > > I was confused by this one, since I couldn't figure out which tests
> > > Gábor was referring to here. I responded in [1], but haven't heard back
> > > since the end of October.
> > > ...
> > > [1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/
> 
> I keep referring to the test in:
> 
>   https://public-inbox.org/git/20230830200218.GA5147@szeder.dev/
> 
> which, rather disappointingly, is still the only test out there
> exercising the interaction between split commit graphs and different
> modified path Bloom filter versions.  Note that in that message I
> mentioned that merging layers with differenet Bloom filter versions
> seemed to work, but that, alas, is no longer the case, because it's
> now broken in Taylor's recent iterations of the patch series.
> 
> At the risk of sounding like a broken record: the interaction of split
> commit graphs and different Bloom filter versions should be thoroughly
> tested.

On a related note, if current git (I tried current master and v2.43.0)
encounters a commit graph layer containing v2 Bloom filters (created
by current seen) while writing a new commit graph, then it segfaults
dereferencing a NULL 'settings' pointer in
get_or_compute_bloom_filter().

The test below demonstrates this, but it's quite hacky using two
different git versions: it has to be run by an old git version not yet
supporting v2 Bloom filters, and a new git version already supporting
them should be installed at /tmp/git-new/.


diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 2ba0324a69..0464dd68d5 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -454,4 +454,33 @@ test_expect_success 'Bloom reader notices out-of-order index offsets' '
 	test_cmp expect.err err
 '
 
+CENT=$(printf "\302\242")
+test_expect_success 'split commit graph vs changed paths Bloom filter v2 vs old git' '
+	git init split-v2-old &&
+	(
+		cd split-v2-old &&
+		git commit --allow-empty -m "Bloom filters are written but still ignored for root commits :(" &&
+		for i in 1 2 3
+		do
+			echo $i >$CENT &&
+			git add $CENT &&
+			git commit -m "$i" || return 1
+		done &&
+		git log --oneline -- $CENT >expect &&
+
+		# Here we write a commit graph layer containing v2 changed
+		# path Bloom filters using a git binary built from current
+		# 'seen' branch.
+		git rev-parse HEAD^ |
+		/tmp/git-new/bin/git -c commitgraph.changedPathsVersion=2 \
+			commit-graph write --stdin-commits --changed-paths --split &&
+
+		# This is current master, and segfaults.
+		git commit-graph write --reachable --changed-paths &&
+
+		git log --oneline -- $CENT >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done



(gdb) r
Starting program: /home/szeder/src/git/git commit-graph write --reachable --changed-paths
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555556b8714 in get_or_compute_bloom_filter (r=0x555555a0e5a0 <the_repo>, c=0x555555a1cdd0, compute_if_not_present=1, settings=0x0, computed=0x7fffffffd634) at bloom.c:253
253		diffopt.max_changes = settings->max_changed_paths;
(gdb) l
248			return NULL;
249	
250		repo_diff_setup(r, &diffopt);
251		diffopt.flags.recursive = 1;
252		diffopt.detect_rename = 0;
253		diffopt.max_changes = settings->max_changed_paths;
254		diff_setup_done(&diffopt);
255	
256		/* ensure commit is parsed so we have parent information */
257		repo_parse_commit(r, c);
(gdb) bt
#0  0x00005555556b8714 in get_or_compute_bloom_filter (
    r=0x555555a0e5a0 <the_repo>, c=0x555555a1cdd0, compute_if_not_present=1, 
    settings=0x0, computed=0x7fffffffd634) at bloom.c:253
#1  0x00005555556d16d5 in compute_bloom_filters (ctx=0x555555a1c200)
    at commit-graph.c:1783
#2  0x00005555556d4329 in write_commit_graph (odb=0x555555a19210, 
    pack_indexes=0x0, commits=0x7fffffffd7c0, 
    flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_BLOOM_FILTERS), 
    opts=0x5555559ef8d0 <write_opts>) at commit-graph.c:2603
#3  0x00005555556d19ab in write_commit_graph_reachable (odb=0x555555a19210, 
    flags=(COMMIT_GRAPH_WRITE_PROGRESS | COMMIT_GRAPH_WRITE_BLOOM_FILTERS), 
    opts=0x5555559ef8d0 <write_opts>) at commit-graph.c:1849
#4  0x00005555555a63c2 in graph_write (argc=0, argv=0x7fffffffde20, prefix=0x0)
    at builtin/commit-graph.c:294
#5  0x00005555555a66f4 in cmd_commit_graph (argc=3, argv=0x7fffffffde20, 
    prefix=0x0) at builtin/commit-graph.c:353
#6  0x0000555555575b43 in run_builtin (p=0x5555559da260 <commands+576>, 
    argc=4, argv=0x7fffffffde20) at git.c:469
#7  0x0000555555575fcb in handle_builtin (argc=4, argv=0x7fffffffde20)
    at git.c:724
#8  0x0000555555576272 in run_argv (argcp=0x7fffffffdc7c, argv=0x7fffffffdc70)
    at git.c:788
#9  0x000055555557685d in cmd_main (argc=4, argv=0x7fffffffde20) at git.c:923
#10 0x000055555568ad55 in main (argc=5, argv=0x7fffffffde18)
    at common-main.c:62



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-13 22:06       ` Taylor Blau
@ 2024-01-13 23:41         ` SZEDER Gábor
  2024-01-16 20:37           ` Taylor Blau
  0 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2024-01-13 23:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git

On Sat, Jan 13, 2024 at 05:06:11PM -0500, Taylor Blau wrote:
> Hi Gábor,
> 
> Thanks very much for your patience while I figure all of this out. I'm
> confident that with the fixup! below that your test passes in the next
> round of this series.
> 
> On Sat, Jan 13, 2024 at 07:35:44PM +0100, SZEDER Gábor wrote:
> > On Wed, Jan 03, 2024 at 10:08:18AM -0800, Junio C Hamano wrote:
> > > Taylor Blau <me@ttaylorr.com> writes:
> > >
> > > >> * tb/path-filter-fix (2023-10-18) 17 commits
> > > >>  - bloom: introduce `deinit_bloom_filters()`
> > > >>  ...
> > > >>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> > > >>
> > > >>  The Bloom filter used for path limited history traversal was broken
> > > >>  on systems whose "char" is unsigned; update the implementation and
> > > >>  bump the format version to 2.
> > > >>
> > > >>  Expecting a reroll.
> > > >>  cf. <20231023202212.GA5470@szeder.dev>
> > > >>  source: <cover.1697653929.git.me@ttaylorr.com>
> > > >
> > > > I was confused by this one, since I couldn't figure out which tests
> > > > Gábor was referring to here. I responded in [1], but haven't heard back
> > > > since the end of October.
> > > > ...
> > > > [1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/
> >
> > I keep referring to the test in:
> >
> >   https://public-inbox.org/git/20230830200218.GA5147@szeder.dev/
> >
> > which, rather disappointingly, is still the only test out there
> > exercising the interaction between split commit graphs and different
> > modified path Bloom filter versions.  Note that in that message I
> > mentioned that merging layers with differenet Bloom filter versions
> > seemed to work, but that, alas, is no longer the case, because it's
> > now broken in Taylor's recent iterations of the patch series.
> 
> Thanks for clarifying. With all of the different versions of this series
> and the couple of tests that you and I were talking about, I think that
> I got confused and thought you were referring to a different test.
> 
> Indeed, the current version of this series (v4) does not pass the test
> in <20230830200218.GA5147@szeder.dev>, but the fix in order for it to
> pass is relatively straightforward.
> 
> I have this on top of my local branch as a fixup! and I'll squash it
> down on Tuesday[^1] and send a reroll after double-checking that the fix
> is correct and doesn't conflict with any other parts of the series.
> 
> Since it's been so long since I've looked at this, I'll re-read the
> series as a whole with fresh eyes to make sure that everything still
> makes sense.
> 
> In any case, here's the patch on top (with a lightly modified version of
> the test you wrote in <20230830200218.GA5147@szeder.dev>:

I certainly hope that I'm just misunderstanding, and you don't
actually imply that this one test in its current form would qualify as
thorough testing... :)


> Subject: [PATCH] fixup! commit-graph: ensure Bloom filters are read with
>  consistent settings
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c       |  3 ++-
>  t/t4216-log-bloom.sh | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 60fa64d956..82a4632c4e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -507,7 +507,8 @@ static void validate_mixed_bloom_settings(struct commit_graph *g)
>  		}
> 
>  		if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
> -		    g->bloom_filter_settings->num_hashes != settings->num_hashes) {
> +		    g->bloom_filter_settings->num_hashes != settings->num_hashes ||
> +		    g->bloom_filter_settings->hash_version != settings->hash_version) {
>  			g->chunk_bloom_indexes = NULL;
>  			g->chunk_bloom_data = NULL;
>  			FREE_AND_NULL(g->bloom_filter_settings);
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 569f2b6f8b..d8c42e2e27 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -438,7 +438,7 @@ test_expect_success 'setup for mixed Bloom setting tests' '
>  	done
>  '
> 
> -test_expect_success 'ensure incompatible Bloom filters are ignored' '
> +test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
>  	# Compute Bloom filters with "unusual" settings.
>  	git -C $repo rev-parse one >in &&
>  	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
> @@ -488,6 +488,33 @@ test_expect_success 'merge graph layers with incompatible Bloom settings' '
>  	test_must_be_empty err
>  '
> 
> +test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
> +	rm "$repo/$graph" &&
> +
> +	git -C $repo log --oneline --no-decorate -- $CENT >expect &&
> +
> +	# Compute v1 Bloom filters for commits at the bottom.
> +	git -C $repo rev-parse HEAD^ >in &&
> +	git -C $repo commit-graph write --stdin-commits --changed-paths \
> +		--split <in &&
> +
> +	# Compute v2 Bloomfilters for the rest of the commits at the top.
> +	git -C $repo rev-parse HEAD >in &&
> +	git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
> +		--stdin-commits --changed-paths --split=no-merge <in &&
> +
> +	test_line_count = 2 $repo/$chain &&
> +
> +	git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
> +	test_cmp expect actual &&
> +
> +	layer="$(head -n 1 $repo/$chain)" &&
> +	cat >expect.err <<-EOF &&
> +	warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
> +	EOF
> +	test_cmp expect.err err
> +'
> +
>  get_first_changed_path_filter () {
>  	test-tool read-graph bloom-filters >filters.dat &&
>  	head -n 1 filters.dat
> 
> --
> 2.38.0.16.g393fd4c6db
> 
> (Excuse the old version of Git here, I'm in the middle of a long-running
> process which checks out older versions of the tree to count the number
> of unused-parameters over time.)
> 
> Thanks,
> Taylor
> 
> [^1]: Monday is a US holiday, so I likely won't be at my computer.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-13 23:41         ` SZEDER Gábor
@ 2024-01-16 20:37           ` Taylor Blau
  2024-02-25 22:59             ` SZEDER Gábor
  0 siblings, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-16 20:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Hi Gábor,

On Sun, Jan 14, 2024 at 12:41:34AM +0100, SZEDER Gábor wrote:
> > In any case, here's the patch on top (with a lightly modified version of
> > the test you wrote in <20230830200218.GA5147@szeder.dev>:
>
> I certainly hope that I'm just misunderstanding, and you don't
> actually imply that this one test in its current form would qualify as
> thorough testing... :)

I hear what you're saying, though I think that the interesting behavior
that would be most likely to regress is the transition between different
Bloom filter settings/hash-version across split commit-graph layers.

We have extensive tests on either "side" of this transition for both v1
and v2 Bloom filters, so I'm not sure what we'd want to add there. Like
I said, the transition is the primary (previously-)untested area of this
code that I would want to ensure is covered to protect against
regressions.

I think that the most recent round of this series accomplishes that
goal.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-13 22:51       ` SZEDER Gábor
@ 2024-01-16 20:49         ` Taylor Blau
  2024-01-16 22:45           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-16 20:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

On Sat, Jan 13, 2024 at 11:51:57PM +0100, SZEDER Gábor wrote:
> On a related note, if current git (I tried current master and v2.43.0)
> encounters a commit graph layer containing v2 Bloom filters (created
> by current seen) while writing a new commit graph, then it segfaults
> dereferencing a NULL 'settings' pointer in
> get_or_compute_bloom_filter().
>
> The test below demonstrates this, but it's quite hacky using two
> different git versions: it has to be run by an old git version not yet
> supporting v2 Bloom filters, and a new git version already supporting
> them should be installed at /tmp/git-new/.
>
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 2ba0324a69..0464dd68d5 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -454,4 +454,33 @@ test_expect_success 'Bloom reader notices out-of-order index offsets' '
>  	test_cmp expect.err err
>  '
>
> +CENT=$(printf "\302\242")
> +test_expect_success 'split commit graph vs changed paths Bloom filter v2 vs old git' '
> +	git init split-v2-old &&
> +	(
> +		cd split-v2-old &&
> +		git commit --allow-empty -m "Bloom filters are written but still ignored for root commits :(" &&
> +		for i in 1 2 3
> +		do
> +			echo $i >$CENT &&
> +			git add $CENT &&
> +			git commit -m "$i" || return 1
> +		done &&
> +		git log --oneline -- $CENT >expect &&
> +
> +		# Here we write a commit graph layer containing v2 changed
> +		# path Bloom filters using a git binary built from current
> +		# 'seen' branch.
> +		git rev-parse HEAD^ |
> +		/tmp/git-new/bin/git -c commitgraph.changedPathsVersion=2 \
> +			commit-graph write --stdin-commits --changed-paths --split &&
> +
> +		# This is current master, and segfaults.
> +		git commit-graph write --reachable --changed-paths &&
> +
> +		git log --oneline -- $CENT >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_done

Thanks. The segfault is reproducible on my end, but I don't think that
it is possible to fix this for existing versions of Git. The problem (as
you note in your backtrace) is here:

    #0  0x000055555569c842 in get_or_compute_bloom_filter (
        r=0x5555559c9ce0 <the_repo>, c=0x5555559dffd0, compute_if_not_present=1,
        settings=0x0, computed=0x7fffffffe0f4) at bloom.c:253

Which tries to dereference ctx->bloom_settings, which is NULL. Note that
we initialize some sensible defaults for ctx->bloom_settings in
commit-graph.c::write_commit_graph():

    struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
    /* ... */
    bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
                                                  bloom_settings.bits_per_entry);
    bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
                                              bloom_settings.num_hashes);
    bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
                                                     bloom_settings.max_changed_paths);
    ctx->bloom_settings = &bloom_settings;

...but we'll throw those away in favor of whatever is in the topmost
layer of the existing commit-graph chain later on in that same function:

    if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
      struct commit_graph *g;

      g = ctx->r->objects->commit_graph;

      /* We have changed-paths already. Keep them in the next graph */
      if (g && g->chunk_bloom_data) {
        ctx->changed_paths = 1;
        ctx->bloom_settings = g->bloom_filter_settings;
      }
    }

OK, everything seems fine thus far, until we inspect the value of
g->bloom_filter_settings, which is NULL, becuase of this hunk from
commit-graph.c::graph_read_bloom_data():

    if (hash_version != 1)
      return 0;

which terminates the function before we assign g->bloom_filter_settings
for the existing (written with v2 Bloom filters) graph layer.

I don't think that there is a way to fix this in a backwards compatible
way, but I'm comfortable with that in this instance since we don't
expect users to upgrading to v2 Bloom filters and then writing new graph
layers using a non-v2 compatible version of Git.

We can add a warning in the series that I'm working on indicating this,
but I don't think there's much more we can do besides changing this to
indicate a warning and bailing instead of segfaulting.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-16 20:49         ` Taylor Blau
@ 2024-01-16 22:45           ` Junio C Hamano
  2024-01-16 23:31             ` Taylor Blau
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2024-01-16 22:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: SZEDER Gábor, git

Taylor Blau <me@ttaylorr.com> writes:

> OK, everything seems fine thus far, until we inspect the value of
> g->bloom_filter_settings, which is NULL, becuase of this hunk from
> commit-graph.c::graph_read_bloom_data():
>
>     if (hash_version != 1)
>       return 0;
>
> which terminates the function before we assign g->bloom_filter_settings
> for the existing (written with v2 Bloom filters) graph layer.
>
> I don't think that there is a way to fix this in a backwards compatible
> way, but I'm comfortable with that in this instance since we don't
> expect users to upgrading to v2 Bloom filters and then writing new graph
> layers using a non-v2 compatible version of Git.

A big red button solution to avoid this would be to uprev the
repository format version once you start writing v2 Bloom filters
anywhere in the layers.  That way, existing Git clients would not be
able to touch it.  I do not know if the cure is more severe than the
disease in that case, though.

In any case, at least, we should be able to prepare the code that we
teach to grok v2 today so that they do not trigger the same segfault
when they see a commit graph layer containing v3 Bloom filters (or
later).  Then we won't have to have the same conversation when we
somehow need to update Bloom filters again.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-16 22:45           ` Junio C Hamano
@ 2024-01-16 23:31             ` Taylor Blau
  2024-01-16 23:42               ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-16 23:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SZEDER Gábor, git

On Tue, Jan 16, 2024 at 02:45:41PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > OK, everything seems fine thus far, until we inspect the value of
> > g->bloom_filter_settings, which is NULL, becuase of this hunk from
> > commit-graph.c::graph_read_bloom_data():
> >
> >     if (hash_version != 1)
> >       return 0;
> >
> > which terminates the function before we assign g->bloom_filter_settings
> > for the existing (written with v2 Bloom filters) graph layer.
> >
> > I don't think that there is a way to fix this in a backwards compatible
> > way, but I'm comfortable with that in this instance since we don't
> > expect users to upgrading to v2 Bloom filters and then writing new graph
> > layers using a non-v2 compatible version of Git.
>
> A big red button solution to avoid this would be to uprev the
> repository format version once you start writing v2 Bloom filters
> anywhere in the layers.  That way, existing Git clients would not be
> able to touch it.  I do not know if the cure is more severe than the
> disease in that case, though.

I tend to think that in this case the cure is probably worse than the
disease. I expect it to be extremely rare that a user would upgrade to a
modern version of Git, write commit-graphs, then downgrade, and try to
write more commit-graphs.

> In any case, at least, we should be able to prepare the code that we
> teach to grok v2 today so that they do not trigger the same segfault
> when they see a commit graph layer containing v3 Bloom filters (or
> later).  Then we won't have to have the same conversation when we
> somehow need to update Bloom filters again.

This series should accomplish that by loading the Bloom chunk
unconditionally, and only reading its filters when they match the
given hash_version.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-16 23:31             ` Taylor Blau
@ 2024-01-16 23:42               ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2024-01-16 23:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: SZEDER Gábor, git

Taylor Blau <me@ttaylorr.com> writes:

>> A big red button solution to avoid this would be to uprev the
>> repository format version once you start writing v2 Bloom filters
>> anywhere in the layers.  That way, existing Git clients would not be
>> able to touch it.  I do not know if the cure is more severe than the
>> disease in that case, though.
>
> I tend to think that in this case the cure is probably worse than the
> disease. I expect it to be extremely rare that a user would upgrade to a
> modern version of Git, write commit-graphs, then downgrade, and try to
> write more commit-graphs.

But then the big red button solution would rarely misfire for users
because they will not downgrade (and see "gee, I now need to stick
to the newer version"), no?

I am not seriously suggesting to do this, but I am not sure if
documenting "don't do this because you'll break your repository"
is sufficient.

>> In any case, at least, we should be able to prepare the code that we
>> teach to grok v2 today so that they do not trigger the same segfault
>> when they see a commit graph layer containing v3 Bloom filters (or
>> later).  Then we won't have to have the same conversation when we
>> somehow need to update Bloom filters again.
>
> This series should accomplish that by loading the Bloom chunk
> unconditionally, and only reading its filters when they match the
> given hash_version.

Good.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-01-16 20:37           ` Taylor Blau
@ 2024-02-25 22:59             ` SZEDER Gábor
  2024-02-26 14:44               ` Taylor Blau
  0 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2024-02-25 22:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git

On Tue, Jan 16, 2024 at 03:37:24PM -0500, Taylor Blau wrote:
> Hi Gábor,
> 
> On Sun, Jan 14, 2024 at 12:41:34AM +0100, SZEDER Gábor wrote:
> > > In any case, here's the patch on top (with a lightly modified version of
> > > the test you wrote in <20230830200218.GA5147@szeder.dev>:
> >
> > I certainly hope that I'm just misunderstanding, and you don't
> > actually imply that this one test in its current form would qualify as
> > thorough testing... :)
> 
> I hear what you're saying, though I think that the interesting behavior
> that would be most likely to regress is the transition between different
> Bloom filter settings/hash-version across split commit-graph layers.
> 
> We have extensive tests on either "side" of this transition for both v1
> and v2 Bloom filters, so I'm not sure what we'd want to add there. Like
> I said, the transition is the primary (previously-)untested area of this
> code that I would want to ensure is covered to protect against
> regressions.
> 
> I think that the most recent round of this series accomplishes that
> goal.

It's great that we finally have test cases for different Bloom filter
settings in different commit-graph layers, including a test case that
merges those layers, but that test case doesn't check that the
resulting merged commit-graph file contains the right settings.  And
there is still no test case that merges layers with different Bloom
filter versions.
I think adding these would be the bare minimum...  and would need more
for due diligence.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
  2024-02-25 22:59             ` SZEDER Gábor
@ 2024-02-26 14:44               ` Taylor Blau
  0 siblings, 0 replies; 20+ messages in thread
From: Taylor Blau @ 2024-02-26 14:44 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Hi SZEDER,

On Sun, Feb 25, 2024 at 11:59:18PM +0100, SZEDER Gábor wrote:
> It's great that we finally have test cases for different Bloom filter
> settings in different commit-graph layers, including a test case that
> merges those layers, but that test case doesn't check that the
> resulting merged commit-graph file contains the right settings.  And
> there is still no test case that merges layers with different Bloom
> filter versions.

Thanks for reviewing.

I'm happy to produce another round of this series that would address
what you've added here.

Are there any other specific things you'd like to see addressed in a
subsequent round? I want to make sure that I'm addressing all of your
concerns, and avoid the need for yet another round of this series that
addresses things that I could have done in the first place.

> I think adding these would be the bare minimum...  and would need more
> for due diligence.

As I said above, I'm happy to add these things in, but please do let me
know if there are others.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-02-26 14:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
2024-01-03 16:37   ` Junio C Hamano
2024-01-05  8:59     ` Jeff King
2024-01-05 16:34       ` Junio C Hamano
2024-01-03 17:14   ` René Scharfe
2024-01-03 16:43 ` Taylor Blau
2024-01-03 18:08   ` Junio C Hamano
2024-01-13 18:35     ` SZEDER Gábor
2024-01-13 22:06       ` Taylor Blau
2024-01-13 23:41         ` SZEDER Gábor
2024-01-16 20:37           ` Taylor Blau
2024-02-25 22:59             ` SZEDER Gábor
2024-02-26 14:44               ` Taylor Blau
2024-01-13 22:51       ` SZEDER Gábor
2024-01-16 20:49         ` Taylor Blau
2024-01-16 22:45           ` Junio C Hamano
2024-01-16 23:31             ` Taylor Blau
2024-01-16 23:42               ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).