* What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Junio C Hamano @ 2020-09-01 21:28 UTC (permalink / raw)
To: git
Here are the topics that have been cooking. Commits prefixed with '-' are
only in 'seen' (formerly 'pu'---proposed updates) while commits prefixed
with '+' are in 'next'. The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.
You can find the changes described here in the integration branches of the
repositories listed at
http://git-blame.blogspot.com/p/git-public-repositories.html
--------------------------------------------------
[Graduated to 'master']
* al/bisect-first-parent (2020-08-22) 1 commit
(merged to 'next' on 2020-08-24 at f95fbf45a6)
+ bisect: add first-parent option to documentation
Finishing touches.
* am/ci-wsfix (2020-08-21) 1 commit
(merged to 'next' on 2020-08-24 at 8491e031f1)
+ ci: fix inconsistent indentation
Aesthetic fix to a CI configuration file.
* dd/diff-customize-index-line-abbrev (2020-08-21) 2 commits
(merged to 'next' on 2020-08-24 at 74e842a2c8)
+ diff: index-line: respect --abbrev in object's name
+ t4013: improve diff-post-processor logic
The output from the "diff" family of the commands had abbreviated
object names of blobs involved in the patch, but its length was not
affected by the --abbrev option. Now it is.
* hn/refs-pseudorefs (2020-08-21) 4 commits
(merged to 'next' on 2020-08-24 at 3579abe8ff)
+ sequencer: treat REVERT_HEAD as a pseudo ref
+ builtin/commit: suggest update-ref for pseudoref removal
+ sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
+ refs: make refs_ref_exists public
Accesses to two pseudorefs have been updated to properly use ref
API.
* hv/ref-filter-trailers-atom-parsing-fix (2020-08-21) 2 commits
(merged to 'next' on 2020-08-24 at 79b27f3263)
+ ref-filter: 'contents:trailers' show error if `:` is missing
+ t6300: unify %(trailers) and %(contents:trailers) tests
The parser for "git for-each-ref --format=..." was too loose when
parsing the "%(trailers...)" atom, and forgot that "trailers" and
"trailers:<modifiers>" are the only two allowed forms, which has
been corrected.
* jc/ident-whose-ident (2020-08-21) 1 commit
(merged to 'next' on 2020-08-27 at caf5149c28)
+ ident: say whose identity is missing when giving user.name hint
Error message update.
* jk/index-pack-w-more-threads (2020-08-21) 3 commits
(merged to 'next' on 2020-08-24 at 18f18a5b66)
+ index-pack: adjust default threading cap
+ p5302: count up to online-cpus for thread tests
+ p5302: disable thread-count parameter tests by default
Long ago, we decided to use 3 threads by default when running the
index-pack task in parallel, which has been adjusted a bit upwards.
* jk/refspecs-cleanup (2020-08-17) 2 commits
(merged to 'next' on 2020-08-24 at 807a080ebf)
+ refspec: make sure stack refspec_item variables are zeroed
+ refspec: fix documentation referring to refspec_item
(this branch is used by jk/refspecs-negative.)
Preliminary code clean-up before introducing "negative refspec".
* jk/rev-input-given-fix (2020-08-26) 1 commit
(merged to 'next' on 2020-08-27 at da291a327c)
+ revision: set rev_input_given in handle_revision_arg()
Feeding "$ZERO_OID" to "git log --ignore-missing --stdin", and
running "git log --ignore-missing $ZERO_OID" fell back to start
digging from HEAD; it has been corrected to become a no-op, like
"git log --tags=no-tag-matches-this-pattern" does.
* jt/promisor-pack-fix (2020-08-20) 1 commit
(merged to 'next' on 2020-08-24 at cd26d30d8d)
+ fetch-pack: in partial clone, pass --promisor
Updates into a lazy/partial clone with a submodule did not work
well with transfer.fsckobjects set.
* ps/ref-transaction-hook (2020-08-25) 1 commit
(merged to 'next' on 2020-08-27 at 49b3fb8349)
+ refs: remove lookup cache for reference-transaction hook
Code simplification by removing ineffective optimization.
* rp/apply-cached-doc (2020-08-20) 1 commit
(merged to 'next' on 2020-08-27 at 1d610f08ea)
+ git-apply.txt: update descriptions of --cached, --index
The description of --cached/--index options in "git apply --help"
has been updated.
* rs/checkout-no-overlay-pathspec-fix (2020-08-22) 1 commit
(merged to 'next' on 2020-08-27 at 277e39346d)
+ checkout, restore: make pathspec recursive
"git restore/checkout --no-overlay" with wildcarded pathspec
mistakenly removed matching paths in subdirectories, which has been
corrected.
--------------------------------------------------
[New Topics]
* hl/bisect-doc-clarify-bad-good-ordering (2020-08-28) 1 commit
(merged to 'next' on 2020-08-31 at 11ce613916)
+ bisect: swap command-line options in documentation
Doc update.
Will merge to 'master'.
* jc/post-checkout-doc (2020-08-27) 1 commit
- doc: clarify how exit status of post-checkout hook is used
Doc update.
Will merge to 'next'.
* jt/interpret-branch-name-fallback (2020-08-29) 2 commits
(merged to 'next' on 2020-08-31 at 01f5dc8cc0)
+ wt-status: tolerate dangling marks
+ sha1-name: replace unsigned int with option struct
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error. Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.
Will merge to 'master'.
* pb/doc-sequence-editor-configuration (2020-08-31) 1 commit
(merged to 'next' on 2020-08-31 at 506466270c)
+ doc: mention GIT_SEQUENCE_EDITOR and 'sequence.editor' more
Doc update.
Will merge to 'master'.
* pb/imap-send-updates (2020-08-31) 3 commits
- git-imap-send.txt: add note about localized Gmail folders
- git-imap-send.txt: do verify SSL certificate for gmail.com
- git-imap-send.txt: don't duplicate 'Examples' sections
"git imap-send" updates.
Will merge to 'next'.
* so/separate-field-for-m-and-diff-merges (2020-08-31) 1 commit
(merged to 'next' on 2020-08-31 at 8def2984ca)
+ revision: add separate field for "-m" of "diff-index -m"
Internal API clean-up to handle two options "diff-index" and "log"
have, which happen to share the same short form, more sensibly.
Will merge to 'master'.
* vv/send-email-with-less-secure-apps-access (2020-08-29) 1 commit
- Documentation/git-send-email.txt: Mention less secure app access might need to enable.
Doc update.
Expecting a reroll.
cf. <xmqqwo1hi9nv.fsf@gitster.c.googlers.com>
cf. <xmqqft85i72s.fsf@gitster.c.googlers.com>
* ew/decline-core-abbrev (2020-09-01) 1 commit
- core.abbrev <off|false|no> disables abbreviations
Allow the configuration to specify no abbreviation regardless of
the hash algorithm.
Expecting a reroll. The intent is very good.
* jk/xrealloc-avoid-use-after-free (2020-09-01) 1 commit
- xrealloc: do not reuse pointer freed by zero-length realloc()
It was possible for xrealloc() to send a non-NULL pointer that has
been freed, which has been fixed.
Expecting a reroll.
* pb/doc-external-diff-env (2020-09-01) 1 commit
- git.txt: correct stale 'GIT_EXTERNAL_DIFF' description
Doc update.
Will merge to 'next'.
--------------------------------------------------
[Stalled]
* jc/war-on-dashed-git (2020-08-27) 1 commit
- git: catch an attempt to run "git-foo"
(this branch uses jc/undash-in-tree-git-callers.)
The first step to remove on-disk binaries for built-in subcommands
by soliciting objections.
On hold for now.
* tb/bloom-improvements (2020-08-11) 14 commits
- builtin/commit-graph.c: introduce '--max-new-filters=<n>'
- commit-graph: rename 'split_commit_graph_opts'
- commit-graph: add large-filters bitmap chunk
- commit-graph.c: sort index into commits list
- bloom/diff: properly short-circuit on max_changes
- bloom: use provided 'struct bloom_filter_settings'
- csum-file.h: introduce 'hashwrite_be64()'
- bloom: split 'get_bloom_filter()' in two
- commit-graph.c: store maximum changed paths
- commit-graph: respect 'commitGraph.readChangedPaths'
- t/helper/test-read-graph.c: prepare repo settings
- commit-graph: pass a 'struct repository *' in more places
- t4216: use an '&&'-chain
- commit-graph: introduce 'get_bloom_filter_settings()'
Misc Bloom filter improvements.
Expecting a reroll.
It seems that the review is getting closer to result in another update.
cf. <20200811220503.GC66656@syl.lan>
* es/config-hooks (2020-07-30) 6 commits
- hook: add 'run' subcommand
- parse-options: parse into argv_array
- hook: add --porcelain to list command
- hook: add list command
- hook: scaffolding for git-hook subcommand
- doc: propose hooks managed by the config
The "hooks defined in config" topic.
Expecting a reroll.
Now jk/strvec is in 'master', we may want to see the topic reworked
on top of it. Are there unresolved issues, or does the topic need
a round of detailed review?
cf. <xmqqmu3i9kvg.fsf@gitster.c.googlers.com>
* mt/grep-sparse-checkout (2020-06-12) 6 commits
- config: add setting to ignore sparsity patterns in some cmds
- grep: honor sparse checkout patterns
- config: correctly read worktree configs in submodules
- t/helper/test-config: facilitate addition of new cli options
- t/helper/test-config: return exit codes consistently
- doc: grep: unify info on configuration variables
"git grep" has been tweaked to be limited to the sparse checkout
paths.
Review needed on 4/6; otherwise looking sane.
cf. <CABPp-BGdEyEeajYZj_rdxp=MyEQdszuyjVTax=hhYj3fOtRQUQ@mail.gmail.com>
* ls/mergetool-meld-auto-merge (2020-07-12) 2 commits
- SQUASH???
- Support auto-merge for meld to follow the vim-diff behavior
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.
Expecting a reroll.
* mf/submodule-summary-with-correct-repository (2020-06-24) 2 commits
- submodule: use submodule repository when preparing summary
- revision: use repository from rev_info when parsing commits
"git diff/show" on a change that involves a submodule used to read
the information on commits in the submodule from a wrong repository
and gave a wrong information when the commit-graph is involved.
Needs tests.
* dr/push-remoteref-fix (2020-04-23) 1 commit
- remote.c: fix handling of %(push:remoteref)
The "%(push:remoteref)" placeholder in the "--format=" argument of
"git format-patch" (and friends) only showed what got explicitly
configured, not what ref at the receiving end would be updated when
"git push" was used, as it ignored the default behaviour (e.g. update
the same ref as the source).
Expecting a reroll.
cf. <20200416152145.wp2zeibxmuyas6y6@feanor>
* mk/use-size-t-in-zlib (2018-10-15) 1 commit
- zlib.c: use size_t for size
The wrapper to call into zlib followed our long tradition to use
"unsigned long" for sizes of regions in memory, which have been
updated to use "size_t".
--------------------------------------------------
[Cooking]
* mr/bisect-in-c-2 (2020-08-31) 13 commits
- bisect--helper: retire `--bisect-autostart` subcommand
- bisect--helper: retire `--write-terms` subcommand
- bisect--helper: retire `--check-expected-revs` subcommand
- bisect--helper: reimplement `bisect_state` & `bisect_head` shell functions in C
- bisect--helper: retire `--next-all` subcommand
- bisect--helper: retire `--bisect-clean-state` subcommand
- bisect--helper: finish porting `bisect_start()` to C
- bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C
- bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()'
- bisect--helper: reimplement `bisect_autostart` shell function in C
- bisect--helper: introduce new `write_in_file()` function
- bisect--helper: use '-res' in 'cmd_bisect__helper' return
- bisect--helper: BUG() in cmd_*() on invalid subcommand
Rewrite of the "git bisect" script in C continues.
* jc/undash-in-tree-git-callers (2020-08-27) 3 commits
(merged to 'next' on 2020-08-27 at 671fa2f87e)
+ credential-cache: use child_process.args
+ cvsexportcommit: do not run git programs in dashed form
+ transport-helper: do not run git-remote-ext etc. in dashed form
(this branch is used by jc/war-on-dashed-git.)
A handful of places in in-tree code still relied on being able to
execute the git subcommands, especially built-ins, in "git-foo"
form, which have been corrected.
Will merge to 'master'.
* tb/repack-clearing-midx (2020-08-28) 2 commits
(merged to 'next' on 2020-08-28 at 4204c0cb5e)
+ midx: traverse the local MIDX first
(merged to 'next' on 2020-08-27 at a465875cbb)
+ builtin/repack.c: invalidate MIDX only when necessary
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.
Will merge to 'master'.
* jc/run-command-use-embedded-args (2020-08-26) 1 commit
(merged to 'next' on 2020-08-27 at c2b688e8e9)
+ run_command: teach API users to use embedded 'args' more
Various callers of run_command API has been modernized.
Will merge to 'master'.
* es/worktree-repair (2020-08-31) 5 commits
(merged to 'next' on 2020-08-31 at 604825c5e4)
+ init: make --separate-git-dir work from within linked worktree
+ init: teach --separate-git-dir to repair linked worktrees
+ worktree: teach "repair" to fix outgoing links to worktrees
+ worktree: teach "repair" to fix worktree back-links to main worktree
+ worktree: add skeleton "repair" command
"git worktree repair" command to correct on-disk pointers between
the repository and its secondary working trees.
Will merge to 'master'.
* jk/worktree-check-clean-leakfix (2020-08-27) 1 commit
(merged to 'next' on 2020-08-31 at 220fc43629)
+ worktree: fix leak in check_clean_worktree()
Leakfix.
Will merge to 'master'.
* so/pretty-abbrev-doc (2020-08-27) 1 commit
(merged to 'next' on 2020-08-31 at d664bd0c06)
+ pretty-options.txt: fix --no-abbrev-commit description
Documentation update for "--no-abbrev-commit".
Will merge to 'master'.
* ss/submodule-summary-in-c-fixes (2020-08-27) 3 commits
- t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
- submodule: fix style in function definition
- submodule: eliminate unused parameters from print_submodule_summary()
(this branch uses ss/submodule-summary-in-c.)
Fixups to a topic in 'next'.
Will merge to 'next'.
* js/no-builtins-on-disk-option (2020-08-24) 3 commits
- ci: stop linking built-ins to the dashed versions
- install: optionally skip linking/copying the built-ins
- msvc: copy the correct `.pdb` files in the Makefile target `install`
The installation procedure learned to optionally omit "git-foo"
executable files for each 'foo' built-in subcommand, which are only
required by old timers that still rely on the age old promise that
prepending "git --exec-path" output to PATH early in their script
will keep the "git-foo" calls they wrote working.
The old attempt to remove these executables from the disk failed in
the 1.6 era; it may be worth attempting again, but I think it is
worth to keep this topic separate from such a policy change to help
it graduate early.
cf. https://public-inbox.org/git/7vprnzt7d5.fsf@gitster.siamese.dyndns.org/
* jt/threaded-index-pack (2020-08-27) 9 commits
- builtin/index-pack.c: fix some sparse warnings
- fixup! index-pack: make quantum of work smaller
- index-pack: make quantum of work smaller
- index-pack: make resolve_delta() assume base data
- index-pack: calculate {ref,ofs}_{first,last} early
- index-pack: remove redundant child field
- index-pack: unify threaded and unthreaded code
- index-pack: remove redundant parameter
- Documentation: deltaBaseCacheLimit is per-thread
"git index-pack" learned to resolve deltified objects with greater
parallelism.
Expecting the final reroll.
cf. https://colabti.org/irclogger/irclogger_log/git-devel?date=2020-08-31#l82
* hv/ref-filter-misc (2020-08-28) 8 commits
- ref-filter: add `sanitize` option for 'subject' atom
- pretty: refactor `format_sanitized_subject()`
- ref-filter: add `short` modifier to 'parent' atom
- ref-filter: add `short` modifier to 'tree' atom
- ref-filter: rename `objectname` related functions and fields
- ref-filter: modify error messages in `grab_objectname()`
- ref-filter: refactor `grab_objectname()`
- ref-filter: support different email formats
The "--format=" option to the "for-each-ref" command and friends
learned a few more tricks, e.g. the ":short" suffix that applies to
"objectname" now also can be used for "parent", "tree", etc.
Will merge to 'next'.
* jk/refspecs-negative (2020-08-21) 1 commit
- refspec: add support for negative refspecs
"negative refspecs"
* jt/fetch-pack-loosen-validation-with-packfile-uri (2020-08-24) 3 commits
(merged to 'next' on 2020-08-27 at efd171f172)
+ fetch-pack: make packfile URIs work with transfer.fsckobjects
+ fetch-pack: document only_packfile in get_pack()
+ (various): document from_promisor parameter
Bugfix for "git fetch" when the packfile URI capability is in use.
Will merge to 'master'.
* mr/diff-hide-stat-wo-textual-change (2020-08-19) 1 commit
(merged to 'next' on 2020-08-27 at b9e97254ae)
+ diff: teach --stat to ignore uninteresting modifications
"git diff --stat -w" showed 0-line changes for paths whose changes
were only whitespaces, which was not intuitive. We now omit such
paths from the stat output.
Will merge to 'master'.
* pw/add-p-allowed-options-fix (2020-08-17) 2 commits
(merged to 'next' on 2020-08-27 at 6cd62753f7)
+ add -p: fix checking of user input
+ add -p: use ALLOC_GROW_BY instead of ALLOW_GROW
"git add -p" update.
Will merge to 'master'.
* jt/lazy-fetch (2020-08-18) 7 commits
(merged to 'next' on 2020-08-27 at 85f2319ba1)
+ fetch-pack: remove no_dependents code
+ promisor-remote: lazy-fetch objects in subprocess
+ fetch-pack: do not lazy-fetch during ref iteration
+ fetch: only populate existing_refs if needed
+ fetch: avoid reading submodule config until needed
+ fetch: allow refspecs specified through stdin
+ negotiator/noop: add noop fetch negotiator
Updates to on-demand fetching code in lazily cloned repositories.
Will merge to 'master'.
* jx/proc-receive-hook (2020-08-27) 10 commits
- doc: add documentation for the proc-receive hook
- transport: parse report options for tracking refs
- t5411: test updates of remote-tracking branches
- receive-pack: new config receive.procReceiveRefs
- doc: add document for capability report-status-v2
- New capability "report-status-v2" for git-push
- receive-pack: feed report options to post-receive
- receive-pack: add new proc-receive hook
- t5411: add basic test cases for proc-receive hook
- transport: not report a non-head push as a branch
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
Looking good.
* pw/rebase-i-more-options (2020-08-26) 6 commits
(merged to 'next' on 2020-08-27 at c55cfeb247)
+ t3436: do not run git-merge-recursive in dashed form
(merged to 'next' on 2020-08-21 at ade71fd49b)
+ rebase: add --reset-author-date
+ rebase -i: support --ignore-date
+ rebase -i: support --committer-date-is-author-date
+ am: stop exporting GIT_COMMITTER_DATE
+ rebase -i: add --ignore-whitespace flag
"git rebase -i" learns a bit more options.
Will merge to 'master'.
* jk/slimmed-down (2020-08-13) 5 commits
(merged to 'next' on 2020-08-27 at bc8e9450c6)
+ drop vcs-svn experiment
+ make git-fast-import a builtin
+ make git-bugreport a builtin
+ make credential helpers builtins
+ Makefile: drop builtins from MSVC pdb list
Trim an unused binary and turn a bunch of commands into built-in.
Will merge to 'master'.
* ss/t7401-modernize (2020-08-21) 5 commits
(merged to 'next' on 2020-08-27 at 516cba9c64)
+ t7401: add a NEEDSWORK
+ t7401: change indentation for enhanced readability
+ t7401: change syntax of test_i18ncmp calls for clarity
+ t7401: use 'short' instead of 'verify' and cut in rev-parse calls
+ t7401: modernize style
Test clean-up.
Will merge to 'master'.
* ds/maintenance-part-2 (2020-08-25) 8 commits
- maintenance: add incremental-repack auto condition
- maintenance: auto-size incremental-repack batch
- maintenance: add incremental-repack task
- midx: use start_delayed_progress()
- midx: enable core.multiPackIndex by default
- maintenance: create auto condition for loose-objects
- maintenance: add loose-objects task
- maintenance: add prefetch task
(this branch uses ds/maintenance-part-1.)
"git maintenance", an extended big brother of "git gc", continues
to evolve.
* ss/submodule-summary-in-c (2020-08-12) 4 commits
(merged to 'next' on 2020-08-17 at 9bc352cb70)
+ submodule: port submodule subcommand 'summary' from shell to C
+ t7421: introduce a test script for verifying 'summary' output
+ submodule: rename helper functions to avoid ambiguity
+ submodule: remove extra line feeds between callback struct and macro
(this branch is used by ss/submodule-summary-in-c-fixes.)
Yet another subcommand of "git submodule" is getting rewritten in C.
Will merge to 'master'.
* ds/maintenance-part-1 (2020-08-25) 11 commits
- maintenance: add trace2 regions for task execution
- maintenance: add auto condition for commit-graph task
- maintenance: use pointers to check --auto
- maintenance: create maintenance.<task>.enabled config
- maintenance: take a lock on the objects directory
- maintenance: add --task option
- maintenance: add commit-graph task
- maintenance: initialize task array
- maintenance: replace run_auto_gc()
- maintenance: add --quiet option
- maintenance: create basic maintenance runner
(this branch is used by ds/maintenance-part-2.)
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
Almost ready for 'next'.
cf. https://colabti.org/irclogger/irclogger_log/git-devel?date=2020-08-31#l44
--------------------------------------------------
[Discarded]
* jc/remove-pack-redundant (2020-08-25) 1 commit
- pack-redundant: gauge the usage before proposing its removal
The first step to remove "git pack-redundant" by soliciting
objections.
Stop--we had some activity as late as last year.
^ permalink raw reply
* Re: [PATCH v2 03/11] merge-index: libify merge_one_path() and merge_all()
From: Junio C Hamano @ 2020-09-01 21:11 UTC (permalink / raw)
To: Alban Gruin; +Cc: git, phillip.wood
In-Reply-To: <20200901105705.6059-4-alban.gruin@gmail.com>
Alban Gruin <alban.gruin@gmail.com> writes:
> The "resolve" and "octopus" merge strategies do not call directly `git
> merge-one-file', they delegate the work to another git command, `git
> merge-index', that will loop over files in the index and call the
> specified command. Unfortunately, these functions are not part of
> libgit.a, which means that once rewritten, the strategies would still
> have to invoke `merge-one-file' by spawning a new process first.
>
> To avoid this, this moves merge_one_path(), merge_all(), and their
> helpers to merge-strategies.c. They also take a callback to dictate
> what they should do for each file. For now, only one launching a new
> process is defined to preserve the behaviour of the builtin version.
... of the "builtin" version? I thought this series is introducing
a new builtin version? Puzzled...
^ permalink raw reply
* Re: [PATCH v2 02/11] merge-one-file: rewrite in C
From: Junio C Hamano @ 2020-09-01 21:06 UTC (permalink / raw)
To: Alban Gruin; +Cc: git, phillip.wood
In-Reply-To: <20200901105705.6059-3-alban.gruin@gmail.com>
Alban Gruin <alban.gruin@gmail.com> writes:
> diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c
> new file mode 100644
> index 0000000000..306a86c2f0
> --- /dev/null
> +++ b/builtin/merge-one-file.c
> @@ -0,0 +1,85 @@
> +/*
> + * Builtin "git merge-one-file"
> + *
> + * Copyright (c) 2020 Alban Gruin
> + *
> + * Based on git-merge-one-file.sh, written by Linus Torvalds.
> + *
> + * This is the git per-file merge utility, called with
> + *
> + * argv[1] - original file SHA1 (or empty)
> + * argv[2] - file in branch1 SHA1 (or empty)
> + * argv[3] - file in branch2 SHA1 (or empty)
> + * argv[4] - pathname in repository
> + * argv[5] - original file mode (or empty)
> + * argv[6] - file in branch1 mode (or empty)
> + * argv[7] - file in branch2 mode (or empty)
> + *
> + * Handle some trivial cases. The _really_ trivial cases have been
> + * handled already by git read-tree, but that one doesn't do any merges
> + * that might change the tree layout.
> + */
> +
> +#include "cache.h"
> +#include "builtin.h"
> +#include "lockfile.h"
> +#include "merge-strategies.h"
> +
> +static const char builtin_merge_one_file_usage[] =
> + "git merge-one-file <orig blob> <our blob> <their blob> <path> "
> + "<orig mode> <our mode> <their mode>\n\n"
> + "Blob ids and modes should be empty for missing files.";
> +
> +int cmd_merge_one_file(int argc, const char **argv, const char *prefix)
> +{
> + struct object_id orig_blob, our_blob, their_blob,
> + *p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL;
> + unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0;
> + struct lock_file lock = LOCK_INIT;
> +
> + if (argc != 8)
> + usage(builtin_merge_one_file_usage);
> +
> + if (repo_read_index(the_repository) < 0)
> + die("invalid index");
> +
> + repo_hold_locked_index(the_repository, &lock, LOCK_DIE_ON_ERROR);
I do understand why we would want merge_strategies_one_file() helper
introduced by this step so that the helper can work in an arbitrary
repository (hence taking a pointer to repository structure as one of
its parameters).
But the "merge-one-file" command will always work in the_repository.
I do not see a point in using helpers that can work in an arbitrary
repository, like repo_read_index() or repo_hold_locked_index(), in
the above. I only see downsides --- it is longer to read, makes
readers wonder if there is something tricky involving another
repository going on, etc.
> + if (!get_oid(argv[1], &orig_blob)) {
> + p_orig_blob = &orig_blob;
> + orig_mode = strtol(argv[5], NULL, 8);
Write a wrapper around strtol(...,...,8) to reduce repetition, and
make sure you do not pass NULL as the second parameter to strtol()
to always check you parsed the string to the end.
> + ret = merge_strategies_one_file(the_repository,
> + p_orig_blob, p_our_blob, p_their_blob, argv[4],
> + orig_mode, our_mode, their_mode);
Here, as I said above, it is perfectly fine to pass
the_repository().
> + if (ret) {
> + rollback_lock_file(&lock);
> + return ret;
> + }
> +
> + return write_locked_index(the_repository->index, &lock, COMMIT_LOCK);
Likewise, I do not see much point in saying the_repository->index; the_index
is a perfectly fine short-hand.
> diff --git a/merge-strategies.c b/merge-strategies.c
> new file mode 100644
> index 0000000000..f2af4a894d
> --- /dev/null
> +++ b/merge-strategies.c
> @@ -0,0 +1,199 @@
> +#include "cache.h"
> +#include "dir.h"
> +#include "ll-merge.h"
> +#include "merge-strategies.h"
> +#include "xdiff-interface.h"
> +
> +static int add_to_index_cacheinfo(struct index_state *istate,
> + unsigned int mode,
> + const struct object_id *oid, const char *path)
> +{
> + struct cache_entry *ce;
> + int len, option;
> +
> + if (!verify_path(path, mode))
> + return error(_("Invalid path '%s'"), path);
> +
> + len = strlen(path);
> + ce = make_empty_cache_entry(istate, len);
> +
> + oidcpy(&ce->oid, oid);
> + memcpy(ce->name, path, len);
> + ce->ce_flags = create_ce_flags(0);
> + ce->ce_namelen = len;
> + ce->ce_mode = create_ce_mode(mode);
> + if (assume_unchanged)
> + ce->ce_flags |= CE_VALID;
> + option = ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE;
> + if (add_index_entry(istate, ce, option))
> + return error(_("%s: cannot add to the index"), path);
> +
> + return 0;
> +}
> +
> +static int checkout_from_index(struct index_state *istate, const char *path)
> +{
> + struct checkout state = CHECKOUT_INIT;
> + struct cache_entry *ce;
> +
> + state.istate = istate;
> + state.force = 1;
> + state.base_dir = "";
> + state.base_dir_len = 0;
> +
> + ce = index_file_exists(istate, path, strlen(path), 0);
> + if (checkout_entry(ce, &state, NULL, NULL) < 0)
> + return error(_("%s: cannot checkout file"), path);
> + return 0;
> +}
> +
> +static int merge_one_file_deleted(struct index_state *istate,
> + const struct object_id *orig_blob,
> + const struct object_id *our_blob,
> + const struct object_id *their_blob, const char *path,
> + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode)
> +{
> + if ((our_blob && orig_mode != our_mode) ||
> + (their_blob && orig_mode != their_mode))
> + return error(_("File %s deleted on one branch but had its "
> + "permissions changed on the other."), path);
> +
> + if (our_blob) {
> + printf(_("Removing %s\n"), path);
> +
> + if (file_exists(path))
> + remove_path(path);
> + }
> +
> + if (remove_file_from_index(istate, path))
> + return error("%s: cannot remove from the index", path);
> + return 0;
> +}
These functions we see above all are now easy to write these days,
thanks to the previous work that built many helpers to perform ommon
operations (e.g. remove_path()). Reusing them is very good.
> +static int do_merge_one_file(struct index_state *istate,
> + const struct object_id *orig_blob,
> + const struct object_id *our_blob,
> + const struct object_id *their_blob, const char *path,
> + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode)
> +{
> + int ret, i, dest;
> + mmbuffer_t result = {NULL, 0};
> + mmfile_t mmfs[3];
> + struct ll_merge_options merge_opts = {0};
> + struct cache_entry *ce;
> +
> + if (our_mode == S_IFLNK || their_mode == S_IFLNK)
> + return error(_("%s: Not merging symbolic link changes."), path);
> + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK)
> + return error(_("%s: Not merging conflicting submodule changes."), path);
> +
> + read_mmblob(mmfs + 1, our_blob);
> + read_mmblob(mmfs + 2, their_blob);
> +
> + if (orig_blob) {
> + printf(_("Auto-merging %s\n"), path);
> + read_mmblob(mmfs + 0, orig_blob);
> + } else {
> + printf(_("Added %s in both, but differently.\n"), path);
> + read_mmblob(mmfs + 0, &null_oid);
> + }
> +
> + merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM;
> + ret = ll_merge(&result, path,
> + mmfs + 0, "orig",
> + mmfs + 1, "our",
> + mmfs + 2, "their",
> + istate, &merge_opts);
> +
> + for (i = 0; i < 3; i++)
> + free(mmfs[i].ptr);
> +
> + if (ret > 127 || !orig_blob)
> + ret = error(_("content conflict in %s"), path);
The original only checked if ret is zero or non-zero; here we
require ret to be large. Intended?
ll_merge() that called ll_xdl_merge() (i.e. the most common case)
would return the value returned from xdl_merge(), which can be -1
when we got an error before calling xdl_do_merge(). xdl_do_merge()
in turn can return -1. The most common case returns the value
returned from xdl_cleanup_merge(), which is 0 for clean merge, and
any positive integer (not clipped to 127 or 128) for conflicted one.
> + /* Create the working tree file, using "our tree" version from
> + the index, and then store the result of the merge. */
Style. (cf. Documentation/CodingGuidelines).
> + ce = index_file_exists(istate, path, strlen(path), 0);
> + if (!ce)
> + BUG("file is not present in the cache?");
> +
> + unlink(path);
> + dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode);
> + write_in_full(dest, result.ptr, result.size);
If open() fails, we write to a bogus file descriptor here.
> + close(dest);
> +
> + free(result.ptr);
> +
> + if (ret && our_mode != their_mode)
> + return error(_("permission conflict: %o->%o,%o in %s"),
> + orig_mode, our_mode, their_mode, path);
> + if (ret)
> + return 1;
What is the error returning convention around here? Our usual
convention is that 0 signals a success, and negative reports an
error. Returning the value returned from add_file_to_index() below,
and error() above, are consistent with the convention, but this one
returns 1 that is not. When deviating from convention, it needs to
be documented for the callers in a comment before the function
definition.
> +
> + return add_file_to_index(istate, path, 0);
> +}
> +int merge_strategies_one_file(struct repository *r,
> + const struct object_id *orig_blob,
> + const struct object_id *our_blob,
> + const struct object_id *their_blob, const char *path,
> + unsigned int orig_mode, unsigned int our_mode,
> + unsigned int their_mode)
> +{
> + if (orig_blob &&
> + ((!their_blob && our_blob && oideq(orig_blob, our_blob)) ||
> + (!our_blob && their_blob && oideq(orig_blob, their_blob))))
> + /* Deleted in both or deleted in one and unchanged in
> + the other */
> + return merge_one_file_deleted(r->index,
> + orig_blob, our_blob, their_blob, path,
> + orig_mode, our_mode, their_mode);
> + else if (!orig_blob && our_blob && !their_blob) {
> + /* Added in one. The other side did not add and we
> + added so there is nothing to be done, except making
> + the path merged. */
> + return add_to_index_cacheinfo(r->index, our_mode, our_blob, path);
> + } else if (!orig_blob && !our_blob && their_blob) {
> + printf(_("Adding %s\n"), path);
> +
> + if (file_exists(path))
> + return error(_("untracked %s is overwritten by the merge."), path);
> +
> + if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path))
> + return 1;
> + return checkout_from_index(r->index, path);
> + } else if (!orig_blob && our_blob && their_blob &&
> + oideq(our_blob, their_blob)) {
> + /* Added in both, identically (check for same
> + permissions). */
> + if (our_mode != their_mode)
> + return error(_("File %s added identically in both branches, "
> + "but permissions conflict %o->%o."),
> + path, our_mode, their_mode);
> +
> + printf(_("Adding %s\n"), path);
> +
> + if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path))
> + return 1;
> + return checkout_from_index(r->index, path);
> + } else if (our_blob && their_blob)
> + /* Modified in both, but differently. */
> + return do_merge_one_file(r->index,
> + orig_blob, our_blob, their_blob, path,
> + orig_mode, our_mode, their_mode);
> + else {
> + char *orig_hex = "", *our_hex = "", *their_hex = "";
> +
> + if (orig_blob)
> + orig_hex = oid_to_hex(orig_blob);
> + if (our_blob)
> + our_hex = oid_to_hex(our_blob);
> + if (their_blob)
> + their_hex = oid_to_hex(their_blob);
Prepare three char [] buffers and use oid_to_hex_r() instead,
instead of relying that we'd have sufficient number of entries in
the rotating buffer.
> + return error(_("%s: Not handling case %s -> %s -> %s"),
> + path, orig_hex, our_hex, their_hex);
> + }
> +
> + return 0;
> +}
> diff --git a/merge-strategies.h b/merge-strategies.h
> new file mode 100644
> index 0000000000..b527d145c7
> --- /dev/null
> +++ b/merge-strategies.h
> @@ -0,0 +1,13 @@
> +#ifndef MERGE_STRATEGIES_H
> +#define MERGE_STRATEGIES_H
> +
> +#include "object.h"
> +
> +int merge_strategies_one_file(struct repository *r,
> + const struct object_id *orig_blob,
> + const struct object_id *our_blob,
> + const struct object_id *their_blob, const char *path,
> + unsigned int orig_mode, unsigned int our_mode,
> + unsigned int their_mode);
> +
> +#endif /* MERGE_STRATEGIES_H */
> diff --git a/t/t6415-merge-dir-to-symlink.sh b/t/t6415-merge-dir-to-symlink.sh
> index 2eddcc7664..5fb74e39a0 100755
> --- a/t/t6415-merge-dir-to-symlink.sh
> +++ b/t/t6415-merge-dir-to-symlink.sh
> @@ -94,7 +94,7 @@ test_expect_success SYMLINKS 'a/b was resolved as symlink' '
> test -h a/b
> '
>
> -test_expect_failure 'do not lose untracked in merge (resolve)' '
> +test_expect_success 'do not lose untracked in merge (resolve)' '
> git reset --hard &&
> git checkout baseline^0 &&
> >a/b/c/e &&
^ permalink raw reply
* Re: [GSoC][PATCH] submodule: port submodule subcommand 'add' from shell to C
From: Kaartic Sivaraam @ 2020-09-01 20:35 UTC (permalink / raw)
To: Shourya Shukla
Cc: gitster, git, christian.couder, johannes.schindelin, liu.denton
In-Reply-To: <20200831130448.GA119147@konoha>
On Mon, 2020-08-31 at 18:34 +0530, Shourya Shukla wrote:
> On 31/08 01:28, Kaartic Sivaraam wrote:
>
> This is what I have done finally:
> ---
> if (read_cache() < 0)
> die(_("index file corrupt"));
>
> if (!force) {
> if (cache_file_exists(path, strlen(path), ignore_case) ||
> cache_dir_exists(path, strlen(path)))
> die(_("'%s' already exists in the index"), path);
> } else {
> int cache_pos = cache_name_pos(path, strlen(path));
> struct cache_entry *ce = the_index.cache[cache_pos];
> if (cache_pos >= 0 && !S_ISGITLINK(ce->ce_mode))
> die(_("'%s' already exists in the index and is not a "
> "submodule"), path);
> }
> ---
>
> I did not put the 'cache_pos >= 0' at the start since I thought that it
> will unnecessarily increase an indentation level. Since we are using
> 'cache_{file,dir}_exists' in the first check and 'cache_name_pos()' in
> the second, the placement of check at another indentation level would be
> unnecessary. What do you think about this?
>
Interestingly. 'cache_dir_exists' seems to work as expected only when
the global ignore_case whose value seems to depend on core.ignorecase.
So, we can't just rely on 'cache_dir_exists to identify a directory
that has tracked contents. Apparently, the 'directory_exists_in_index'
in 'dir.c' seems to have the code that we want here (which is also the
only user of 'index_dir_exists'; the function for which
'cache_dir_exists' is a convenience wrapper.
The best idea I could think of is to expose that method and re-use it
here. Given that my kowledge about index and caching is primitive, I'm
not sure if there's a better approach. If others have a better idea for
handling this directory case, do enlighten us.
> > This is more close to what the shell version did but misses one case
> > which might or might not be covered by the test suite[1]. The case when
> > path is a directory that has tracked contents. In the shell version we
> > would get:
> >
> > $ git submodule add ../git-crypt/ builtin
> > 'builtin' already exists in the index
> > $ git submodule add --force ../git-crypt/ builtin
> > 'builtin' already exists in the index and is not a submodule
> >
> > In the C version with the above snippet we get:
> >
> > $ git submodule add --force ../git-crypt/ builtin
> > fatal: 'builtin' does not have a commit checked out
> > $ git submodule add ../git-crypt/ builtin
> > fatal: 'builtin' does not have a commit checked out
> >
> > That's not appropriate and should be fixed. I believe we could do
> > something with `cache_dir_exists` to fix this.
> >
> >
> > Footnote
> > ===
> >
> > [1]: If it's not covered already, it might be a good idea to add a test
> > for the above case.
>
> Like Junio said, we do not care if it is a file or a directory of any
> sorts, we will give the error if it already exists. Therefore, even if
> it is an untracked or a tracked one, it should not matter to us. Hence
> testing for it may not be necessary is what I feel. Why should we test
> it?
I'm guessing you misunderstood. A few things:
- We only care about tracked contents for the case in hand.
- Identifying whether a given path corresponds to a directory
which has tracked contents is tricky. Neither 'cache_name_pos'
nor 'cache_file_exists' handle this. 'cache_dir_exists' is also
not very useful as mentioned above.
So, we do have to take care when handling that case as Junio pointed
out.
--
Sivaraam
^ permalink raw reply
* [PATCH 0/2] ci: avoid ugly "failure" annotation in the GitHub workflow
From: Johannes Schindelin via GitGitGadget @ 2020-09-01 20:19 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin
Whenever the GitHub workflow runs in a fork that does not contain the
special-purpose ci-config branch, a big fat failure annotation greets the
casual reader. See e.g.
https://github.com/gitgitgadget/git/actions/runs/233438295
This is caused by the (non-fatal) failure to clone said branch. Let's avoid
that. It's distracting.
Johannes Schindelin (2):
ci: fix indentation of the `ci-config` job
ci: avoid ugly "failure" in the `ci-config` job
.github/workflows/main.yml | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
base-commit: e19713638985533ce461db072b49112da5bd2042
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-719%2Fdscho%2Favoid-error-in-ci-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-719/dscho/avoid-error-in-ci-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/719
--
gitgitgadget
^ permalink raw reply
* [PATCH 2/2] ci: avoid ugly "failure" in the `ci-config` job
From: Johannes Schindelin via GitGitGadget @ 2020-09-01 20:19 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Johannes Schindelin
In-Reply-To: <pull.719.git.1598991568.gitgitgadget@gmail.com>
From: Johannes Schindelin <johannes.schindelin@gmx.de>
In the common case where users have _not_ pushed a `ci-config` branch to
configure which branches should be included in the GitHub workflow runs,
there is a big fat ugly annotation about a failure in the run's log:
X Check failure on line 1 in .github
@github-actions github-actions / ci-config
.github#L1
Process completed with exit code 128.
The reason is that the `ci-config` job tries to clone that `ci-config`
branch, and even if it is configured to continue on error, the
annotation is displayed, and it is distracting.
Let's just handle this on the shell script level, so that the job's step
is not marked as a failure.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
.github/workflows/main.yml | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 6fd1d1a2c8..fcfd138ff1 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -12,7 +12,6 @@ jobs:
enabled: ${{ steps.check-ref.outputs.enabled }}
steps:
- name: try to clone ci-config branch
- continue-on-error: true
run: |
git -c protocol.version=2 clone \
--no-tags \
@@ -24,7 +23,7 @@ jobs:
https://github.com/${{ github.repository }} \
config-repo &&
cd config-repo &&
- git checkout HEAD -- ci/config
+ git checkout HEAD -- ci/config || : ignore
- id: check-ref
name: check whether CI is enabled for ref
run: |
--
gitgitgadget
^ permalink raw reply related
* [PATCH 1/2] ci: fix indentation of the `ci-config` job
From: Johannes Schindelin via GitGitGadget @ 2020-09-01 20:19 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Johannes Schindelin
In-Reply-To: <pull.719.git.1598991568.gitgitgadget@gmail.com>
From: Johannes Schindelin <johannes.schindelin@gmx.de>
The section added in e76eec35540f (ci: allow per-branch config for
GitHub Actions, 2020-05-07) contains a `&&`-chain that connects several
commands. The first command is actually so long that it stretches over
multiple lines, and as per usual, the continuation lines are indented one
more level than the first.
However, the subsequent commands in the `&&`-chain were also indented
one more level than the first command, which was almost certainly
unintended.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
.github/workflows/main.yml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 30425404eb..6fd1d1a2c8 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -23,8 +23,8 @@ jobs:
--filter=blob:none \
https://github.com/${{ github.repository }} \
config-repo &&
- cd config-repo &&
- git checkout HEAD -- ci/config
+ cd config-repo &&
+ git checkout HEAD -- ci/config
- id: check-ref
name: check whether CI is enabled for ref
run: |
--
gitgitgadget
^ permalink raw reply related
* [gitk] bad quoting of user input for command line injection, with possible unintended side effects
From: Lyndon Brown @ 2020-09-01 20:07 UTC (permalink / raw)
To: git@vger.kernel.org
Hi,
So I pasted the following line (without quotes) into the 'find' textbox
of gitk, whilst in 'touching paths' mode, intending to switch to
add/remove afterwards.
"*pp_es = (i_es > 0) ? calloc( i_es, sizeof(**pp_es) ) : NULL;"
This results in an error dialog stating the following:
----
can't read output from command: standard output was redirected
can't read output from command: standard output was redirected
while executing
"open $cmd r+"
(procedure "do_file_hl" line 28)
invoked from within
"do_file_hl 46"
("after" script)
----
What prompted me to report this at this time (I've encountered and
ignored this previously) is that subsequently I switched back to git-
gui and refreshed, and was surprised to find that a new empty file had
been created called "0)".
Clearly the text has made it into some commandline execution without
proper quoting, thus the "> 0)" portion of the text has resulted in
creation of the file.
I'm using version 1:2.28.0-1 on Debian.
^ permalink raw reply
* Re: [PATCH] git.txt: correct stale 'GIT_EXTERNAL_DIFF' description
From: Junio C Hamano @ 2020-09-01 19:16 UTC (permalink / raw)
To: Philippe Blain via GitGitGadget; +Cc: git, Sean Estabrooks, Philippe Blain
In-Reply-To: <pull.718.git.1598966412371.gitgitgadget@gmail.com>
"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/Documentation/diff-generate-patch.txt b/Documentation/diff-generate-patch.txt
> index e8ed6470fb..b10ff4caa6 100644
> --- a/Documentation/diff-generate-patch.txt
> +++ b/Documentation/diff-generate-patch.txt
> @@ -10,7 +10,8 @@ linkgit:git-diff-tree[1], or
> linkgit:git-diff-files[1]
> with the `-p` option produces patch text.
> You can customize the creation of patch text via the
> -`GIT_EXTERNAL_DIFF` and the `GIT_DIFF_OPTS` environment variables.
> +`GIT_EXTERNAL_DIFF` and the `GIT_DIFF_OPTS` environment variables
> +(see linkgit:git[1]).
>
> What the -p option produces is slightly different from the traditional
> diff format:
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 2fc92586b5..98bdf0983c 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -551,8 +551,9 @@ Git Diffs
>
> `GIT_EXTERNAL_DIFF`::
> When the environment variable `GIT_EXTERNAL_DIFF` is set, the
> - program named by it is called, instead of the diff invocation
> - described above. For a path that is added, removed, or modified,
> + program named by it is called to generate diffs, and Git
> + does not use its builtin diff machinery.
> + For a path that is added, removed, or modified,
> `GIT_EXTERNAL_DIFF` is called with 7 parameters:
Excellent. Thanks. Will queue.
^ permalink raw reply
* Re: [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Junio C Hamano @ 2020-09-01 19:14 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Eric Wong, git, brian m. carlson
In-Reply-To: <xmqqblipebto.fsf@gitster.c.googlers.com>
Junio C Hamano <gitster@pobox.com> writes:
> Derrick Stolee <stolee@gmail.com> writes:
>
>>> + else if (!strcasecmp(value, "false") ||
>>> + !strcasecmp(value, "no") ||
>>> + !strcasecmp(value, "off"))
>>> + default_abbrev = the_hash_algo->hexsz;
>>
>> I'm not sure we need three synonyms for "no-abbrev" here.
>
> I do not particularly mind, but if we imitate the variety of various
> boolean false, I'd prefer to see the code to parse them shared to
> avoid them drifting apart over time.
Just a clarification.
- I do not particularly mind having multiple synonyms.
- I do mind these one-off strcasecmp that will cause them to drift
away from what we do for the boolean 'false'.
Thanks.
^ permalink raw reply
* Re: [PATCH v2 3/2] credential-cache: use child_process.args
From: Junio C Hamano @ 2020-09-01 16:11 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20200901044954.GB3956172@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> Hmm, that is interesting. I thought it would work OK because we don't
> rely on any process-id magic for finding the daemon, etc, and instead
> talk over pipe descriptors. But that proves to be our undoing.
Yup. I also do suspect that closing all excess fds in the git
process in the middle may not be a bad idea, but we may not know
what the upper bound is.
> Perhaps git.c should be closing all descriptors after spawning the
> child. Of course that gets weird if it wants to write an error to stderr
> about spawning the child. I dunno. It seems as likely to introduce
> problems as solve them, so if nothing is broken beyond this cache-daemon
> thing, I'd just as soon leave it.
We do employ the "open extra pipe that is set to close-on-exec so
that child that failed to exec can report back" trick, but in order
to report the failure back, the standard error of the process in the
middle may have to be kept open, so let's not disturb this sleeping
dragon ;-)
^ permalink raw reply
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Junio C Hamano @ 2020-09-01 15:58 UTC (permalink / raw)
To: Jeff King; +Cc: Derrick Stolee, git
In-Reply-To: <20200901135105.GA3284077@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> If we do handle it up-front, then I think we'd actually want:
>
> if (!size) {
> free(ptr);
> return xmalloc(0);
> }
>
> (i.e., to never return NULL for consistency with xmalloc() and
> xcalloc()).
Makes sense. I suspect that this is optimizing for a wrong case,
but in practice that should not matter. Not having to worry about
a request to resize to 0-byte in the remainder of the function is
actually a plus for readability, I would say.
>> > @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
>> > memory_limit_check(size, 0);
>> > ret = realloc(ptr, size);
>> > if (!ret && !size)
>> > - ret = realloc(ptr, 1);
>> > + ret = realloc(ret, 1);
>>
>> I appreciate all the additional context for such a small change.
>
> Somebody's got to complete with you for ratio of commit message to diff
> lines. :)
>
> -Peff
^ permalink raw reply
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Junio C Hamano @ 2020-09-01 15:56 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20200901111800.GA3115584@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> The simplest fix here is to just pass "ret" (which we know to be NULL)
> to the follow-up realloc(). That does mean that a system which _doesn't_
> free the original pointer would leak it. But that interpretation of the
> standard seems unlikely (if a system didn't deallocate in this case, I'd
> expect it to simply return the original pointer). If it turns out to be
> an issue, we can handle the "!size" case up front instead, before we
> call realloc() at all.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> wrapper.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/wrapper.c b/wrapper.c
> index 4ff4a9c3db..b0d375beee 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
> memory_limit_check(size, 0);
> ret = realloc(ptr, size);
> if (!ret && !size)
> - ret = realloc(ptr, 1);
> + ret = realloc(ret, 1);
The original does look bogus.
It however may be easier to reason about if we used malloc(1) in the
fallback path for "we got NULL after asking for 0-byte" instead. I
would have a hard time guessing the reason why we are reallocating
NULL without going back to this commit, reading the log and seeing
the original to see that the reason why we didn't use malloc() but
realloc() is we aimed for a minimum change, if I encounter this code
after I forgot this discussion.
Thanks.
^ permalink raw reply
* Re: [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Junio C Hamano @ 2020-09-01 15:49 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Eric Wong, git, brian m. carlson
In-Reply-To: <9c00f29b-45e4-ccdf-6d81-5eabd58c875b@gmail.com>
Derrick Stolee <stolee@gmail.com> writes:
>> + else if (!strcasecmp(value, "false") ||
>> + !strcasecmp(value, "no") ||
>> + !strcasecmp(value, "off"))
>> + default_abbrev = the_hash_algo->hexsz;
>
> I'm not sure we need three synonyms for "no-abbrev" here.
I do not particularly mind, but if we imitate the variety of various
boolean false, I'd prefer to see the code to parse them shared to
avoid them drifting apart over time.
> "false" would be natural, except I think in a few places
> the config value "0" is also interpreted as "false", but
> as seen below a value of "0" snaps up to the minimum
> allowed abbreviation.
I was in the vicinity of this code recently for reviewing another
topic, but IIRC, 0 came from the UI level does get rounded up to the
minimum accepted and never reach "default_abbrev", but if you manage
to place 0 or -1 in default_abbrev here (e.g. with additional code,
like the above part with the right hand side of the assignment
updated), I think the value will propagate throughout the codepath
and causes the downstream code to do the right thing. 0 will give
you no-abbreviation (i.e. full length depending on the length of the
hash) and -1 will give you the "scale as appropriate for the size of
the object store".
I have mild preference for using 0 over hardcoded single "full
length" here. Even though we currently do not plan to allow
multiple hashes in use simultaneously in a single invocation of Git,
if that ever happens, we will regret hardcoding the_hash_algo->hexsz
on the right hand side of the assignment here, like this patch does.
Telling the downstream code in the control flow that we want no
truncation by using 0 would keep both 40-hexdigit and 64-hexdigit
hashes to their original length (as opposed to telling it to
truncate at 40 or 64 by using the_hash_algo->hexsz).
Thanks.
^ permalink raw reply
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Andreas Schwab @ 2020-09-01 15:20 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20200901111800.GA3115584@coredump.intra.peff.net>
On Sep 01 2020, Jeff King wrote:
> diff --git a/wrapper.c b/wrapper.c
> index 4ff4a9c3db..b0d375beee 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
> memory_limit_check(size, 0);
> ret = realloc(ptr, size);
> if (!ret && !size)
> - ret = realloc(ptr, 1);
> + ret = realloc(ret, 1);
Since you already know ret == NULL, you could just say malloc(1), As
written, it looks like a typo (why use a variable to pass a constant?).
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply
* Re: [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Derrick Stolee @ 2020-09-01 14:59 UTC (permalink / raw)
To: Eric Wong; +Cc: git, brian m. carlson
In-Reply-To: <20200901144323.GA14554@dcvr>
On 9/1/2020 10:43 AM, Eric Wong wrote:
> Derrick Stolee <stolee@gmail.com> wrote:
>> After we decide on the word, this patch needs:
>>
>> * Updates to Documentation/config/core.txt
>
> Will do.
Thanks.
>> * A test that works with both hash versions.
>
> Will do, though not too sure where the tests for this should be.
t3404-rebase-interactive.sh seems to already test the
core.abbrev config value. Could be a good place to provide
an extra check.
t4205-log-pretty-formats.sh could also use some references
to core.abbrev.
Thanks,
-Stolee
^ permalink raw reply
* [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Jeff King @ 2020-09-01 11:18 UTC (permalink / raw)
To: git
This patch fixes a bug where xrealloc(ptr, 0) can double-free and
corrupt the heap on some platforms (including at least glibc).
The C99 standard says of malloc (section 7.20.3):
If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object.
So we might get NULL back, or we might get an actual pointer (but we're
not allowed to look at its contents). To simplify our code, our
xmalloc() handles a NULL return by converting it into a single-byte
allocation. That way callers get consistent behavior. This was done way
back in 4e7a2eccc2 (?alloc: do not return NULL when asked for zero
bytes, 2005-12-29).
We also gave xcalloc() and xrealloc() the same treatment. And according
to C99, that is fine; the text above is in a paragraph that applies to
all three. But what happens to the memory we passed to realloc() in such
a case? I.e., if we do:
ret = realloc(ptr, 0);
and "ptr" is non-NULL, but we get NULL back, is "ptr" still valid? C99
doesn't cover this case specifically, but says (section 7.20.3.4):
The realloc function deallocates the old object pointed to by ptr and
returns a pointer to a new object that has the size specified by size.
So "ptr" is now deallocated, and we must only look at "ret". And since
"ret" is NULL, that means we have no allocated object at all. But that's
not quite the whole story. It also says:
If memory for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
[...]
The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object could not be allocated.
So if we see a NULL return with a non-zero size, we can expect that the
original object _is_ still valid. But with a zero size, it's ambiguous.
The NULL return might mean a failure (in which case the object is
valid), or it might mean that we successfully allocated nothing, and
used NULL to represent that.
The glibc manpage for realloc() explicitly says:
[...]if size is equal to zero, and ptr is not NULL, then the call is
equivalent to free(ptr).
Likewise, this StackOverflow answer:
https://stackoverflow.com/a/2135302
claims that C89 gave similar guidance (but I don't have a copy to verify
it). A comment on this answer:
https://stackoverflow.com/a/2022410
claims that Microsoft's CRT behaves the same.
But our current "retry with 1 byte" code passes the original pointer
again. So on glibc, we effectively free() the pointer and then try to
realloc() it again, which is undefined behavior.
The simplest fix here is to just pass "ret" (which we know to be NULL)
to the follow-up realloc(). That does mean that a system which _doesn't_
free the original pointer would leak it. But that interpretation of the
standard seems unlikely (if a system didn't deallocate in this case, I'd
expect it to simply return the original pointer). If it turns out to be
an issue, we can handle the "!size" case up front instead, before we
call realloc() at all.
Signed-off-by: Jeff King <peff@peff.net>
---
wrapper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/wrapper.c b/wrapper.c
index 4ff4a9c3db..b0d375beee 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
memory_limit_check(size, 0);
ret = realloc(ptr, size);
if (!ret && !size)
- ret = realloc(ptr, 1);
+ ret = realloc(ret, 1);
if (!ret)
die("Out of memory, realloc failed");
return ret;
--
2.28.0.844.g468840f815
^ permalink raw reply related
* Re: [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Eric Wong @ 2020-09-01 14:43 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, brian m. carlson
In-Reply-To: <9c00f29b-45e4-ccdf-6d81-5eabd58c875b@gmail.com>
Derrick Stolee <stolee@gmail.com> wrote:
> On 9/1/2020 3:43 AM, Eric Wong wrote:
> > index 2bdff4457b..f2e09c72ca 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1217,6 +1217,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> > return config_error_nonbool(var);
> > if (!strcasecmp(value, "auto"))
> > default_abbrev = -1;
> > + else if (!strcasecmp(value, "false") ||
> > + !strcasecmp(value, "no") ||
> > + !strcasecmp(value, "off"))
> > + default_abbrev = the_hash_algo->hexsz;
>
> I'm not sure we need three synonyms for "no-abbrev" here.
I just used the accepted synonyms since I figured users
would be used to them, already.
> "false" would be natural, except I think in a few places
> the config value "0" is also interpreted as "false", but
> as seen below a value of "0" snaps up to the minimum
> allowed abbreviation.
>
> > else {
> > int abbrev = git_config_int(var, value);
> > if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
It actually errors out on the next line, here.
Perhaps adopting parse_opt_abbrev_cb behavior of clamping to
the minimum and maximum supported values is more consistent?
> Perhaps "core.abbrev = never" would be a good option?
*shrug*
> After we decide on the word, this patch needs:
>
> * Updates to Documentation/config/core.txt
Will do.
> * A test that works with both hash versions.
Will do, though not too sure where the tests for this should be.
Thanks for the comments, will wait a few days for comments
before sending out v2.
^ permalink raw reply
* Re: [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>'
From: SZEDER Gábor @ 2020-09-01 14:36 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, peff, dstolee, gitster
In-Reply-To: <09f6871f66bff838c067a3e0d23cd4622171f3bd.1597178915.git.me@ttaylorr.com>
On Tue, Aug 11, 2020 at 04:52:14PM -0400, Taylor Blau wrote:
> Introduce a command-line flag and configuration variable to fill in the
> 'max_new_filters' variable introduced by the previous patch.
>
> The command-line option '--max-new-filters' takes precedence over
> 'commitGraph.maxNewFilters', which is the default value.
> '--no-max-new-filters' can also be provided, which sets the value back
> to '-1', indicating that an unlimited number of new Bloom filters may be
> generated. (OPT_INTEGER only allows setting the '--no-' variant back to
> '0', hence a custom callback was used instead).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> Documentation/config/commitgraph.txt | 4 +++
> Documentation/git-commit-graph.txt | 4 +++
> bloom.c | 15 +++++++++++
> builtin/commit-graph.c | 39 +++++++++++++++++++++++++---
> commit-graph.c | 27 ++++++++++++++++---
> commit-graph.h | 1 +
> t/t4216-log-bloom.sh | 19 ++++++++++++++
> 7 files changed, 102 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt
> index cff0797b54..4582c39fc4 100644
> --- a/Documentation/config/commitgraph.txt
> +++ b/Documentation/config/commitgraph.txt
> @@ -1,3 +1,7 @@
> +commitGraph.maxNewFilters::
> + Specifies the default value for the `--max-new-filters` option of `git
> + commit-graph write` (c.f., linkgit:git-commit-graph[1]).
> +
> commitGraph.readChangedPaths::
> If true, then git will use the changed-path Bloom filters in the
> commit-graph file (if it exists, and they are present). Defaults to
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 17405c73a9..9c887d5d79 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume
> that this option was intended. Use `--no-changed-paths` to stop storing this
> data.
> +
> +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
> +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
> +enforced. Overrides the `commitGraph.maxNewFilters` configuration.
> ++
> With the `--split[=<strategy>]` option, write the commit-graph as a
> chain of multiple commit-graph files stored in
> `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the
> diff --git a/bloom.c b/bloom.c
> index ed54e96e57..8d07209c6b 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
> else
> start_index = 0;
>
> + if ((start_index == end_index) &&
> + (g->bloom_large && !bitmap_get(g->bloom_large, lex_pos))) {
> + /*
> + * If the filter is zero-length, either (1) the filter has no
> + * changes, (2) the filter has too many changes, or (3) it
> + * wasn't computed (eg., due to '--max-new-filters').
> + *
> + * If either (1) or (2) is the case, the 'large' bit will be set
> + * for this Bloom filter. If it is unset, then it wasn't
> + * computed. In that case, return nothing, since we don't have
> + * that filter in the graph.
> + */
> + return 0;
> + }
> +
> filter->len = end_index - start_index;
> filter->data = (unsigned char *)(g->chunk_bloom_data +
> sizeof(unsigned char) * start_index +
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 38f5f57d15..3500a6e1f1 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = {
> N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
> N_("git commit-graph write [--object-dir <objdir>] [--append] "
> "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> - "[--changed-paths] [--[no-]progress] <split options>"),
> + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
> + "<split options>"),
> NULL
> };
>
> @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = {
> static const char * const builtin_commit_graph_write_usage[] = {
> N_("git commit-graph write [--object-dir <objdir>] [--append] "
> "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> - "[--changed-paths] [--[no-]progress] <split options>"),
> + "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
> + "<split options>"),
> NULL
> };
>
> @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
> return 0;
> }
>
> +static int write_option_max_new_filters(const struct option *opt,
> + const char *arg,
> + int unset)
> +{
> + int *to = opt->value;
> + if (unset)
> + *to = -1;
> + else {
> + const char *s;
> + *to = strtol(arg, (char **)&s, 10);
> + if (*s)
> + return error(_("%s expects a numerical value"),
> + optname(opt, opt->flags));
> + }
> + return 0;
> +}
> +
> static int graph_write(int argc, const char **argv)
> {
> struct string_list pack_indexes = STRING_LIST_INIT_NODUP;
> @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv)
> N_("maximum ratio between two levels of a split commit-graph")),
> OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time,
> N_("only expire files older than a given date-time")),
> + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters,
> + NULL, N_("maximum number of changed-path Bloom filters to compute"),
> + 0, write_option_max_new_filters),
> OPT_END(),
> };
>
> @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv)
> write_opts.size_multiple = 2;
> write_opts.max_commits = 0;
> write_opts.expire_time = 0;
> + write_opts.max_new_filters = -1;
>
> trace2_cmd_mode("write");
>
> @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv)
> return result;
> }
>
> +static int git_commit_graph_config(const char *var, const char *value, void *cb)
> +{
> + if (!strcmp(var, "commitgraph.maxnewfilters")) {
> + write_opts.max_new_filters = git_config_int(var, value);
> + return 0;
> + }
> +
> + return git_default_config(var, value, cb);
> +}
> +
> int cmd_commit_graph(int argc, const char **argv, const char *prefix)
> {
> static struct option builtin_commit_graph_options[] = {
> @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
> usage_with_options(builtin_commit_graph_usage,
> builtin_commit_graph_options);
>
> - git_config(git_default_config, NULL);
> + git_config(git_commit_graph_config, &opts);
> argc = parse_options(argc, argv, prefix,
> builtin_commit_graph_options,
> builtin_commit_graph_usage,
> diff --git a/commit-graph.c b/commit-graph.c
> index 6886f319a5..4aae5471e3 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -954,7 +954,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
> }
>
> static int get_bloom_filter_large_in_graph(struct commit_graph *g,
> - const struct commit *c)
> + const struct commit *c,
> + uint32_t max_changed_paths)
> {
> uint32_t graph_pos = commit_graph_position(c);
> if (graph_pos == COMMIT_NOT_FROM_GRAPH)
> @@ -965,6 +966,17 @@ static int get_bloom_filter_large_in_graph(struct commit_graph *g,
>
> if (!(g && g->bloom_large))
> return 0;
> + if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) {
> + /*
> + * Force all commits which are subject to a different
> + * 'max_changed_paths' limit to be recomputed from scratch.
> + *
> + * Note that this could likely be improved, but is ignored since
> + * all real-world graphs set the maximum number of changed paths
> + * at 512.
> + */
> + return 0;
> + }
> return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base);
> }
>
> @@ -1470,6 +1482,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
> int i;
> struct progress *progress = NULL;
> int *sorted_commits;
> + int max_new_filters;
>
> init_bloom_filters();
> ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1);
> @@ -1486,10 +1499,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
> ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp,
> &ctx->commits);
>
> + max_new_filters = ctx->opts->max_new_filters >= 0 ?
> + ctx->opts->max_new_filters : ctx->commits.nr;
> +
> for (i = 0; i < ctx->commits.nr; i++) {
> int pos = sorted_commits[i];
> struct commit *c = ctx->commits.list[pos];
> - if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) {
> + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph,
> + c,
> + ctx->bloom_settings->max_changed_paths)) {
> bitmap_set(ctx->bloom_large, pos);
> ctx->count_bloom_filter_known_large++;
> } else {
> @@ -1497,7 +1515,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
> struct bloom_filter *filter = get_or_compute_bloom_filter(
> ctx->r,
> c,
> - 1,
> + ctx->count_bloom_filter_computed < max_new_filters,
> ctx->bloom_settings,
> &computed);
> if (computed) {
> @@ -1507,7 +1525,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
> ctx->count_bloom_filter_found_large++;
> }
> }
> - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
> + if (filter)
> + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
> }
> display_progress(progress, i + 1);
> }
> diff --git a/commit-graph.h b/commit-graph.h
> index af08c4505d..75ef83708b 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -114,6 +114,7 @@ struct commit_graph_opts {
> int max_commits;
> timestamp_t expire_time;
> enum commit_graph_split_flags flags;
> + int max_new_filters;
> };
>
> /*
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 6859d85369..3aab8ffbe3 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -286,4 +286,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' '
> )
> '
>
> +test_expect_success 'Bloom generation is limited by --max-new-filters' '
> + (
> + cd limits &&
> + test_commit c2 filter &&
> + test_commit c3 filter &&
> + test_commit c4 no-filter &&
> + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \
> + 2 0 2
> + )
> +'
> +
> +test_expect_success 'Bloom generation backfills previously-skipped filters' '
> + (
> + cd limits &&
> + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \
> + 2 0 1
> + )
> +'
> +
> test_done
> --
> 2.28.0.rc1.13.ge78abce653
Something seems to be wrong in this patch, though I haven't looked
closer. Consider this test with a bit of makeshift tracing:
--- >8 ---
diff --git a/bloom.c b/bloom.c
index 8d07209c6b..1a0dec35cd 100644
--- a/bloom.c
+++ b/bloom.c
@@ -222,6 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
if (!compute_if_not_present)
return NULL;
+ printf("get_or_compute_bloom_filter(): diff: %s\n", oid_to_hex(&c->object.oid));
repo_diff_setup(r, &diffopt);
diffopt.flags.recursive = 1;
diffopt.detect_rename = 0;
diff --git a/t/t9999-test.sh b/t/t9999-test.sh
new file mode 100755
index 0000000000..0833e6ff7e
--- /dev/null
+++ b/t/t9999-test.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description='test'
+
+. ./test-lib.sh
+
+test_expect_success 'test' '
+ for i in 1 2 3 4 5 6
+ do
+ git commit -q --allow-empty -m $i || return 1
+ done &&
+ git log --oneline &&
+
+ # We have 6 commits and compute 2 Bloom filters per execution,
+ # so 3 executions should be enough... but, alas, it isnt.
+ for i in 1 2 3 4 5
+ do
+ # No --split=replace!
+ git commit-graph write --reachable --changed-paths --max-new-filters=2 || return 1
+ done &&
+
+ git commit-graph write --reachable --changed-paths --max-new-filters=4
+'
+
+test_done
--- >8 ---
It's output looks like:
[...]
+ git log --oneline
13fcefa (HEAD -> master) 6
0c71516 5
a82a61c 4
54c6b2c 3
fc99def 2
a3a8cd3 1
+ git commit-graph write --reachable --changed-paths --max-new-filters=2
get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
+ git commit-graph write --reachable --changed-paths --max-new-filters=2
get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
+ git commit-graph write --reachable --changed-paths --max-new-filters=2
get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
+ git commit-graph write --reachable --changed-paths --max-new-filters=2
get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
+ git commit-graph write --reachable --changed-paths --max-new-filters=2
get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
+ git commit-graph write --reachable --changed-paths
get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
get_or_compute_bloom_filter(): diff: a82a61c79b2b07c4440e292613e11a69e33ef7a2
get_or_compute_bloom_filter(): diff: fc99def8b1df27bcab7d1f4b7ced73239f9bd7ec
See how the third write with '--max-new-filters=2' computes the
filters that have already been computed by the first write instead of
those two that have never been computed? And then how the fourth
write computes filters that have already been computed by the second
write? And ultimately we'll need a write without '--max-new-filters' (or
with '--max-new-filters=<large-enough>') to compute all remaining
filters.
With '--split=replace' it appears to work as expected.
^ permalink raw reply related
* Re: [PATCH v3 12/14] commit-graph: add large-filters bitmap chunk
From: SZEDER Gábor @ 2020-09-01 14:35 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, peff, dstolee, gitster
In-Reply-To: <619e0c619dd12e2bea2b80c7d249ba66fe2a315c.1597178915.git.me@ttaylorr.com>
On Tue, Aug 11, 2020 at 04:52:07PM -0400, Taylor Blau wrote:
> When a commit has more than a certain number of changed paths (commonly
> 512), the commit-graph machinery represents it as a zero-length filter.
> This is done since having many entries in the Bloom filter has
> undesirable effects on the false positivity rate.
>
> In addition to these too-large filters, the commit-graph machinery also
> represents commits with no filter and commits with no changed paths in
> the same way.
>
> When writing a commit-graph that aggregates several incremental
> commit-graph layers (eg., with '--split=replace'), the commit-graph
> machinery first computes all of the Bloom filters that it wants to write
> but does not already know about from existing graph layers. Because we
> overload the zero-length filter in the above fashion, this leads to
> recomputing large filters over and over again.
>
> This is already undesirable, since it means that we are wasting
> considerable effort to discover that a commit with too many changed
> paths, only to throw that effort away (and then repeat the process the
> next time a roll-up is performed).
Is this really the case?
If a commit has a corresponding entry in the Bloom filters chunks,
then the commit-graph machinery does know the Bloom filter associated
with that commit. The size of that filter should not matter, i.e.
even if it is a zero-length filter, the commit-graph machinery should
know about it all the same. And as far as I can tell it does indeed,
because load_bloom_filter_from_graph() sets a non-NULL 'filter->data'
pointer even if 'filter->len' is zero, which get_bloom_filter() treats
as "we know about this", and returns early without (re)computing the
filter. Even the test 'Bloom generation does not recompute too-large
filters' added in this patch is expected to succeed, and my
superficial and makeshift testing seems to corroborate this; at least
I couldn't find a combination of commands and options that would
recompute any existing zero-length Bloom filters.
Am I missing something?
> In a subsequent patch, we will add a '--max-new-filters=<n>' option,
> which specifies an upper-bound on the number of new filters we are
> willing to compute from scratch. Suppose that there are 'N' too-large
> filters, and we specify '--max-new-filters=M'. If 'N >= M', it is
> unlikely that any filters will be generated, since we'll spend most of
> our effort on filters that we ultimately throw away. If 'N < M', filters
> will trickle in over time, but only at most 'M - N' per-write.
>
> To address this, add a new chunk which encodes a bitmap where the ith
> bit is on iff the ith commit has zero or at least 512 changed paths.
> Likewise store the maximum number of changed paths we are willing to
> store in order to prepare for eventually making this value more easily
> customizable.
I don't understand how storing this value would make it any easier to
customize.
> When computing Bloom filters, first consult the relevant
> bitmap (in the case that we are rolling up existing layers) to see if
> computing the Bloom filter from scratch would be a waste of time.
>
> This patch implements a new chunk instead of extending the existing BIDX
> and BDAT chunks because modifying these chunks would confuse old
> clients. (Eg., setting the most-significant bit in the BIDX chunk would
> confuse old clients and require a version bump).
>
> To allow using the existing bitmap code with 64-bit words, we write the
> data in network byte order from the 64-bit words. This means we also
> need to read the array from the commit-graph file by translating each
> word from network byte order using get_be64() when loading the commit
> graph. (Note that this *could* be delayed until first-use, but a later
> patch will rely on this being initialized early, so we assume the
> up-front cost when parsing instead of delaying initialization).
>
> By avoiding the need to move to new versions of the BDAT and BIDX chunk,
> we can give ourselves more time to consider whether or not other
> modifications to these chunks are worthwhile without holding up this
> change.
>
> Another approach would be to introduce a new BIDX chunk (say, one
> identified by 'BID2') which is identical to the existing BIDX chunk,
> except the most-significant bit of each offset is interpreted as "this
> filter is too big" iff looking at a BID2 chunk. This avoids having to
> write a bitmap, but forces older clients to rewrite their commit-graphs
> (as well as reduces the theoretical largest Bloom filters we couldl
And it reduces the max possible size of the BDAT chunk, and thus the
max number of commits with Bloom filters as well.
s/couldl/could/
> write, and forces us to maintain the code necessary to translate BIDX
> chunks to BID2 ones). Separately from this patch, I implemented this
> alternate approach and did not find it to be advantageous.
Let's take a step back to reconsider what should be stored in this
bitmap for a moment. Sure, setting a bit for each commit that doesn't
modify any paths or modifies too many makes it possible to repliably
identify commits that don't have Bloom filters yet. But isn't it a
bit roundabout way? I think it would be better to directly track
which commits don't have Bloom filters yet. IOW what you really want
is a, say, BNCY "Bloom filter Not Computed Yet" chunk, where we set
the corresponding bit for each commit which has an entry in the BIDX
chunk but for which a Bloom filter hasn't been computed yet.
- It's simpler and easier to explain (IMO).
- This bitmap chunk can easily be made optional: if all Bloom
filters have been computed, then the bitmap will contain all
zeros. So why bother writing it, when we can save a bit of space
instead?
- It avoids the unpleasentness of setting a bit in the _Large_ Bloom
Filters chunks for commits _not_ modifying any paths.
- Less incentive to spill implementation details to the format
specification (e.g. 512 modified paths).
Now, let's take another step back: is such a bitmap really necessary?
We could write a single-byte Bloom filter with no bits set for commits
not modifying any paths, and a single-byte Bloom filter with all bits
set for commits modifying too many paths. This is compatible with the
specs and any existing implementation should do the right thing when
reading such filters, this would allow us to interpret zero-length
filters as "not computed yet", and if that bitmap chunk won't be
optional, then this would save space as long as less than 1/8 of
commits modify no or too many paths. Unfortunately, however, all
existing zero-length Bloom filters have to be recomputed.
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> .../technical/commit-graph-format.txt | 12 +++
> bloom.h | 2 +-
> commit-graph.c | 96 ++++++++++++++++---
> commit-graph.h | 4 +
> t/t4216-log-bloom.sh | 25 ++++-
> 5 files changed, 124 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 440541045d..5f2d9ab4d7 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -123,6 +123,18 @@ CHUNK DATA:
> of length zero.
> * The BDAT chunk is present if and only if BIDX is present.
>
> + Large Bloom Filters (ID: {'B', 'F', 'X', 'L'}) [Optional]
> + * It starts with a 32-bit unsigned integer specifying the maximum number of
> + changed-paths that can be stored in a single Bloom filter.
Should this value be the same in all elements of a commit-graph chain?
Note that in this case having different values won't affect revision
walks using modified path Bloom filters.
> + * It then contains a list of 64-bit words (the length of this list is
> + determined by the width of the chunk) which is a bitmap. The 'i'th bit is
> + set exactly when the 'i'th commit in the graph has a changed-path Bloom
> + filter with zero entries (either because the commit is empty, or because
> + it contains more than 512 changed paths).
Please make clear the byte order of these 64 bit words in the specs as
well.
Furthermore, that 512 path limit is an implementation detail, so it
would be better if it didn't leak into the specification of this new
chunk.
> + * The BFXL chunk is present only when the BIDX and BDAT chunks are
> + also present.
> +
> +
> Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
> This list of H-byte hashes describe a set of B commit-graph files that
> form a commit-graph chain. The graph position for the ith commit in this
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 21b67677ef..6859d85369 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -33,7 +33,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' '
> git commit-graph write --reachable --changed-paths
> '
> graph_read_expect () {
> - NUM_CHUNKS=5
> + NUM_CHUNKS=6
> cat >expect <<- EOF
> header: 43475048 1 1 $NUM_CHUNKS 0
> num_commits: $1
> @@ -262,5 +262,28 @@ test_expect_success 'correctly report changes over limit' '
> done
> )
> '
> +test_bloom_filters_computed () {
> + commit_graph_args=$1
> + bloom_trace_prefix="{\"filter_known_large\":$2,\"filter_found_large\":$3,\"filter_computed\":$4"
> + rm -f "$TRASH_DIRECTORY/trace.event" &&
> + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $commit_graph_args &&
> + grep "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.event"
> +}
> +
> +test_expect_success 'Bloom generation does not recompute too-large filters' '
> + (
> + cd limits &&
> +
> + # start from scratch and rebuild
> + rm -f .git/objects/info/commit-graph &&
> + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \
> + git commit-graph write --reachable --changed-paths \
> + --split=replace &&
> + test_commit c1 filter &&
> +
> + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \
> + 2 0 1
> + )
> +'
>
> test_done
> --
> 2.28.0.rc1.13.ge78abce653
>
^ permalink raw reply
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Derrick Stolee @ 2020-09-01 14:24 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20200901135105.GA3284077@coredump.intra.peff.net>
On 9/1/2020 9:51 AM, Jeff King wrote:
> On Tue, Sep 01, 2020 at 09:04:36AM -0400, Derrick Stolee wrote:
>> Adding an `if (!size) {free(ptr); return NULL;}` block was what I
>> expected. Was that chosen just so we can rely more on the system
>> realloc(), or is there a performance implication that I'm not
>> seeing?
>
> I went back and forth on whether to do that or not. This case should
> basically never happen, so I like both the performance and readability
> of only triggering it when realloc() returns NULL. But it would get rid
> of the hand-waving above, and I doubt the performance is measurable.
>
> If we do handle it up-front, then I think we'd actually want:
>
> if (!size) {
> free(ptr);
> return xmalloc(0);
> }
>
> (i.e., to never return NULL for consistency with xmalloc() and
> xcalloc()).
Good point. In that sense, your change makes a lot more
sense for staying consistent without strangeness like xmalloc(0).
>>> @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
>>> memory_limit_check(size, 0);
>>> ret = realloc(ptr, size);
>>> if (!ret && !size)
>>> - ret = realloc(ptr, 1);
>>> + ret = realloc(ret, 1);
>>
>> I appreciate all the additional context for such a small change.
>
> Somebody's got to complete with you for ratio of commit message to diff
> lines. :)
Pretty sure I have a long way to match, but it is important to
have goals.
-Stolee
^ permalink raw reply
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Jeff King @ 2020-09-01 13:51 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git
In-Reply-To: <c81b7225-a663-1598-62b3-bd80457d5648@gmail.com>
On Tue, Sep 01, 2020 at 09:04:36AM -0400, Derrick Stolee wrote:
> > The simplest fix here is to just pass "ret" (which we know to be NULL)
> > to the follow-up realloc(). That does mean that a system which _doesn't_
> > free the original pointer would leak it. But that interpretation of the
> > standard seems unlikely (if a system didn't deallocate in this case, I'd
> > expect it to simply return the original pointer). If it turns out to be
> > an issue, we can handle the "!size" case up front instead, before we
> > call realloc() at all.
>
> Adding an `if (!size) {free(ptr); return NULL;}` block was what I
> expected. Was that chosen just so we can rely more on the system
> realloc(), or is there a performance implication that I'm not
> seeing?
I went back and forth on whether to do that or not. This case should
basically never happen, so I like both the performance and readability
of only triggering it when realloc() returns NULL. But it would get rid
of the hand-waving above, and I doubt the performance is measurable.
If we do handle it up-front, then I think we'd actually want:
if (!size) {
free(ptr);
return xmalloc(0);
}
(i.e., to never return NULL for consistency with xmalloc() and
xcalloc()).
> > @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
> > memory_limit_check(size, 0);
> > ret = realloc(ptr, size);
> > if (!ret && !size)
> > - ret = realloc(ptr, 1);
> > + ret = realloc(ret, 1);
>
> I appreciate all the additional context for such a small change.
Somebody's got to complete with you for ratio of commit message to diff
lines. :)
-Peff
^ permalink raw reply
* Re: [PATCH] Makefile: add support for generating JSON compilation database
From: Philippe Blain @ 2020-09-01 13:18 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, brian m. carlson, Philippe Blain via GitGitGadget,
Git mailing list
In-Reply-To: <20200901073827.GA3967005@coredump.intra.peff.net>
> Le 1 sept. 2020 à 03:38, Jeff King <peff@peff.net> a écrit :
>
> On Sun, Aug 30, 2020 at 09:24:03PM -0700, Junio C Hamano wrote:
>
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>
>>> On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
>>>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>>>>
>>>> Tools based on LibClang [1] can make use of a 'JSON Compilation
>>>> Database' [2] that keeps track of the exact options used to compile a set
>>>> of source files.
>>>
>>> For additional context why this is valuable, clangd, which is a C
>>> language server protocol implementation, can use these files to
>>> determine the flags needed to compile a file so it can provide proper
>>> editor integration. As a result, editors supporting the language server
>>> protocol (such as VS Code, or Vim with a suitable plugin) can provide
>>> better searching, integration, and refactoring tools.
>>
>> I found that the proposed commit log was very weak to sell the
>> change; some of what you gave above should definitely help strenthen
>> it.
>
> Likewise. Looking at the output, I'm confused how it would help with
> things like searching and refactoring. It might be nice to spell it out
> for those of us exposed to it for the first time (I tried following the
> links but remained unenlightened).
OK, I'll improve the commit message. I'm not at all an expert in this subject,
I just had to generate a compilation database myself to use the Sourcetrail source
explorer [1] with Git so I figured I'd share what I had done. Further exploration of the
topic are in [2] and [3]. Note that I did try some of the tools listed in [2] before resorting
to modifying the Makefile, but these tools either did not work at all or produced wrong
output (ex. strings in the JSON were not properly quoted, etc.)
> I'd also be curious to hear what advantages it gives to add a new
> Makefile knob rather than just letting interested parties add -MJ to
> their CFLAGS. Is it just a convenience to create the concatenated form?
Unfortunately this would not work because the '-MJ' flag needs a file name
to know where to put the JSON fragment.
Thanks,
Philippe.
[1] www.sourcetrail.com
[2] https://sarcasm.github.io/notes/dev/compilation-database.html
[3] https://eli.thegreenplace.net/2014/05/21/compilation-databases-for-clang-based-tools
^ permalink raw reply
* [PATCH] git.txt: correct stale 'GIT_EXTERNAL_DIFF' description
From: Philippe Blain via GitGitGadget @ 2020-09-01 13:20 UTC (permalink / raw)
To: git; +Cc: Sean Estabrooks, Philippe Blain, Philippe Blain
From: Philippe Blain <levraiphilippeblain@gmail.com>
In fde97d8ac6 (Update documentation to remove incorrect GIT_DIFF_OPTS
example., 2006-11-27), the description of the 'GIT_EXTERNAL_DIFF'
variable was moved from 'diff-format.txt' to 'git.txt', and the
documentation was updated to remove a 'diff(1)' invocation since Git did
not use an external diff program anymore by default.
However, the description of 'GIT_EXTERNAL_DIFF' still mentions "instead
of the diff invocation described above", which is confusing.
Correct that outdated sentence.
Also, link to git(1) in 'diff-generate-patch.txt' when GIT_DIFF_OPTS and
GIT_EXTERNAL_DIFF are mentioned, so that users can easily know what
these variables are about.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
git.txt: correct stale 'GIT_EXTERNAL_DIFF' description
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-718%2Fphil-blain%2Fgit-external-diff-wording-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-718/phil-blain/git-external-diff-wording-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/718
Documentation/diff-generate-patch.txt | 3 ++-
Documentation/git.txt | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/Documentation/diff-generate-patch.txt b/Documentation/diff-generate-patch.txt
index e8ed6470fb..b10ff4caa6 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -10,7 +10,8 @@ linkgit:git-diff-tree[1], or
linkgit:git-diff-files[1]
with the `-p` option produces patch text.
You can customize the creation of patch text via the
-`GIT_EXTERNAL_DIFF` and the `GIT_DIFF_OPTS` environment variables.
+`GIT_EXTERNAL_DIFF` and the `GIT_DIFF_OPTS` environment variables
+(see linkgit:git[1]).
What the -p option produces is slightly different from the traditional
diff format:
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 2fc92586b5..98bdf0983c 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -551,8 +551,9 @@ Git Diffs
`GIT_EXTERNAL_DIFF`::
When the environment variable `GIT_EXTERNAL_DIFF` is set, the
- program named by it is called, instead of the diff invocation
- described above. For a path that is added, removed, or modified,
+ program named by it is called to generate diffs, and Git
+ does not use its builtin diff machinery.
+ For a path that is added, removed, or modified,
`GIT_EXTERNAL_DIFF` is called with 7 parameters:
path old-file old-hex old-mode new-file new-hex new-mode
base-commit: d9cd4331470f4d9d78677f12dc79063dab832f53
--
gitgitgadget
^ permalink raw reply related
* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Derrick Stolee @ 2020-09-01 13:04 UTC (permalink / raw)
To: Jeff King, git
In-Reply-To: <20200901111800.GA3115584@coredump.intra.peff.net>
On 9/1/2020 7:18 AM, Jeff King wrote:
> This patch fixes a bug where xrealloc(ptr, 0) can double-free and
> corrupt the heap on some platforms (including at least glibc).
!!! Good find !!!
> The simplest fix here is to just pass "ret" (which we know to be NULL)
> to the follow-up realloc(). That does mean that a system which _doesn't_
> free the original pointer would leak it. But that interpretation of the
> standard seems unlikely (if a system didn't deallocate in this case, I'd
> expect it to simply return the original pointer). If it turns out to be
> an issue, we can handle the "!size" case up front instead, before we
> call realloc() at all.
Adding an `if (!size) {free(ptr); return NULL;}` block was what I
expected. Was that chosen just so we can rely more on the system
realloc(), or is there a performance implication that I'm not
seeing?
> @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
> memory_limit_check(size, 0);
> ret = realloc(ptr, size);
> if (!ret && !size)
> - ret = realloc(ptr, 1);
> + ret = realloc(ret, 1);
I appreciate all the additional context for such a small change.
LGTM.
-Stolee
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox