Git development
 help / color / mirror / Atom feed
* [PATCH 0/12] reducing resource usage of for_each_alternate_ref
From: Jeff King @ 2017-01-24  0:37 UTC (permalink / raw)
  To: git

As I've mentioned before, I have some alternates repositories with
absurd numbers of refs, most of which are duplicates of each other[1].
There are a couple of problems I've seen:

 1. the way that for_each_alternate_ref() works has very high peak memory
    usage for this case

 2. the way that receive-pack de-duplicates the refs has high peak memory
    usage

 3. we access the alternate refs twice in fetch-pack

This fixes all three, along with a few other minor bugfixes, cleanups,
and optimizations. I've tried to order the series to keep bugfixes and
less-contentious changes near the front.

Just to give some numbers, on my worst-case repository (see [1]), this
drops peak RSS for "git clone --reference" from over 25GB to about 40MB.
Sort of, anyway.  You still pay a big CPU and RSS cost on the process in
the alternates repo that accesses packed-refs, but the 25GB came on top
of that. So this is a first pass at the low-hanging fruit.

I'll be the first to admit that this setup is insane. And some of the
optimizations are tradeoffs that help particularly the case where your
refs aren't unique. But for the most part should help _every_ case. And
in the cases where your refs are unique, either you don't have many (so
the tradeoffs are OK) or you have so many that you are pretty much
screwed no matter what (if your fetch is looking at 30 million unique
ref tips, the object storage is your problem, not looking at the refs).

A brief overview of the patches:

  [01/12]: for_each_alternate_ref: handle failure from real_pathdup()
  [02/12]: for_each_alternate_ref: stop trimming trailing slashes
  [03/12]: for_each_alternate_ref: use strbuf for path allocation

    Bugfixes and cleanups (the first one is actually a recent-ish
    regression).

  [04/12]: for_each_alternate_ref: pass name/oid instead of ref struct
  [05/12]: for_each_alternate_ref: replace transport code with for-each-ref
  [06/12]: clone: disable save_commit_buffer

    This gives the 25GB->40MB benefit. There are a bunch of caveats in
    patch 05, but I _think_ it's the right direction for the reasons I
    outlined there.

  [07/12]: fetch-pack: cache results of for_each_alternate_ref

    Just running for-each-ref in the giant alternates repo is about a
    minute of CPU, plus 10GB heap, and fetch-pack wants to do it twice.
    This drops it to once.

  [08/12]: add oidset API
  [09/12]: receive-pack: use oidset to de-duplicate .have lines

    These give another ~12GB RSS improvement when receive-pack looks at
    the alternates in my worst-case repo.

    This is less tested than the earlier ones, as we've disabled
    receive-pack looking at alternates in production (see [1] below).
    I just did them for completeness upstream.

  [10/12]: receive-pack: fix misleading namespace/.have comment
  [11/12]: receive-pack: treat namespace .have lines like alternates
  [12/12]: receive-pack: avoid duplicates between our refs and alternates

    These are optimizations to avoid more duplicate .have lines, and
    should benefit even non-insane cases. As with 8-12, not as well
    tested by me.

 Makefile               |  1 +
 builtin/clone.c        |  1 +
 builtin/receive-pack.c | 41 +++++++++++++++-------------
 fetch-pack.c           | 48 ++++++++++++++++++++++++++++-----
 object.h               |  2 +-
 oidset.c               | 49 ++++++++++++++++++++++++++++++++++
 oidset.h               | 45 +++++++++++++++++++++++++++++++
 t/t5400-send-pack.sh   | 38 ++++++++++++++++++++++++++
 transport.c            | 72 +++++++++++++++++++++++++++++++++++---------------
 transport.h            |  2 +-
 10 files changed, 250 insertions(+), 49 deletions(-)
 create mode 100644 oidset.c
 create mode 100644 oidset.h

-Peff

[1] Background, if you care:

    I've mentioned before that GitHub's repository storage uses a
    fork-network layout. Each user gets their own "fork" repository, and
    it points to "network.git" as shared storage. The network.git
    repository has a ref for each fork that is updated periodically via
    "git fetch".

    So the network.git repo contains O(nr_forks * nr_refs_in_fork) refs.
    Quite a few of these point to the same tip sha1s, as each fork has
    the same tags. One of the most pathological cases is a popular
    public repo that has ~44K tags in it. The network repo has ~80
    million refs, of which ~60K are unique. You can imagine that it
    takes some time to access the refs. Basically anything that loads
    the network.git packed-refs file takes an extra minute of CPU and
    needs ~10G RSS to store the internal cache of `packed-refs`.

    Most operations don't care about this; they work on the fork repo,
    and never look at the network refs. But we _do_ sometimes peek at
    alternate refs from receive-pack and fetch-pack to tell the other
    side about tips we have.

    These optimizations backfire completely in such a setting. Besides
    the CPU and memory spikes on the server, even just the unique refs
    add over 3MB to the ref advertisement. So for the time being, we've
    disabled them. I have patches that add a receive.advertiseAlternates
    config option, if anybody is interested.

    So why do I care?  There is one exception: when we are cloning a
    fork, we turn on the fetch-pack side of the optimizations. This is
    worth it for a large repository, as it saves us having to transfer
    any objects at all (and therefore not process them with index-pack,
    which is expensive).

    This series is also a first step towards other optimizations, like
    asking for just the alternate refs from _one_ of the forks. I hope
    one day to turn alternates optimizations back on everywhere.

^ permalink raw reply

* What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Junio C Hamano @ 2017-01-24  0:18 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

    http://git-blame.blogspot.com/p/git-public-repositories.html

--------------------------------------------------
[Graduated to "master"]

* ad/bisect-terms (2017-01-13) 1 commit
  (merged to 'next' on 2017-01-18 at 9f500d6cf5)
 + Documentation/bisect: improve on (bad|new) and (good|bad)

 Documentation fix.


* bw/read-blob-data-does-not-modify-index-state (2017-01-11) 1 commit
  (merged to 'next' on 2017-01-18 at f33363fb07)
 + index: improve constness for reading blob data

 Code clean-up.


* jk/grep-e-could-be-extended-beyond-posix (2017-01-11) 1 commit
  (merged to 'next' on 2017-01-18 at dd1b9d1fa8)
 + t7810: avoid assumption about invalid regex syntax

 Tighten a test to avoid mistaking an extended ERE regexp engine as
 a PRE regexp engine.


* rh/diff-orderfile-doc (2017-01-15) 2 commits
  (merged to 'next' on 2017-01-18 at cc2af9c628)
 + diff: document the format of the -O (diff.orderFile) file
 + diff: document behavior of relative diff.orderFile

 Documentation fix.


* sb/cd-then-git-can-be-written-as-git-c (2017-01-13) 1 commit
  (merged to 'next' on 2017-01-18 at 8923d2d001)
 + lib-submodule-update.sh: reduce use of subshell by using "git -C"

 Test clean-up.


* sb/submodule-config-tests (2017-01-12) 2 commits
  (merged to 'next' on 2017-01-18 at bd850f6ad3)
 + t7411: test lookup of uninitialized submodules
 + t7411: quote URLs

 Test updates.


* sb/submodule-embed-gitdir (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at 0a5e24a3f0)
 + submodule absorbgitdirs: mention in docstring help

 Help-text fix.


* sb/submodule-init (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at 2e8a38b1cc)
 + submodule update --init: display correct path from submodule

 Error message fix.


* sg/fix-versioncmp-with-common-suffix (2017-01-12) 8 commits
  (merged to 'next' on 2017-01-18 at f79c24a291)
 + versioncmp: generalize version sort suffix reordering
 + versioncmp: factor out helper for suffix matching
 + versioncmp: use earliest-longest contained suffix to determine sorting order
 + versioncmp: cope with common part overlapping with prerelease suffix
 + versioncmp: pass full tagnames to swap_prereleases()
 + t7004-tag: add version sort tests to show prerelease reordering issues
 + t7004-tag: use test_config helper
 + t7004-tag: delete unnecessary tags with test_when_finished

 The prereleaseSuffix feature of version comparison that is used in
 "git tag -l" did not correctly when two or more prereleases for the
 same release were present (e.g. when 2.0, 2.0-beta1, and 2.0-beta2
 are there and the code needs to compare 2.0-beta1 and 2.0-beta2).


* vn/diff-ihc-config (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at ac4915dbe6)
 + diff: add interhunk context config option

 "git diff" learned diff.interHunkContext configuration variable
 that gives the default value for its --inter-hunk-context option.


* ws/request-pull-code-cleanup (2017-01-15) 1 commit
  (merged to 'next' on 2017-01-18 at dfcb1405de)
 + request-pull: drop old USAGE stuff

 Code clean-up.

--------------------------------------------------
[New Topics]

* sh/grep-tree-obj-tweak-output (2017-01-20) 2 commits
 - grep: use '/' delimiter for paths
 - grep: only add delimiter if there isn't one already

 "git grep", when fed a tree-ish as an input, shows each hit
 prefixed with "<tree-ish>:<path>:<lineno>:".  As <tree-ish> is
 almost always either a commit or a tag that points at a commit, the
 early part of the output "<tree-ish>:<path>" can be used as the
 name of the blob and given to "git show".  When <tree-ish> is a
 tree given in the extended SHA-1 syntax (e.g. "<commit>:", or
 "<commit>:<dir>"), however, this results in a string that does not
 name a blob (e.g. "<commit>::<path>" or "<commit>:<dir>:<path>").
 "git grep" has been taught to be a bit more intelligent about these
 cases and omit a colon (in the former case) or use slash (in the
 latter case) to produce "<commit>:<path>" and
 "<commit>:<dir>/<path>" that can be used as the name of a blob.

 Waiting for the review discussion to settle, followed by a reroll.


* vp/show-ref-verify-head (2017-01-23) 5 commits
 - show-ref: remove dead `if (verify)' check
 - show-ref: detect dangling refs under --verify as well
 - show-ref: move --quiet handling into show_one()
 - show-ref: allow -d to work with --verify
 - show-ref: accept HEAD with --verify

 "git show-ref HEAD" used with "--verify" because the user is not
 interested in seeing refs/remotes/origin/HEAD, and used with
 "--head" because the user does not want HEAD to be filtered out,
 i.e. "git show-ref --head --verify HEAD", did not work as expected.

 Will merge to 'next'.


* bc/use-asciidoctor-opt (2017-01-23) 7 commits
 - Makefile: add a knob to enable the use of Asciidoctor
 - Documentation: move dblatex arguments into variable
 - Documentation: add XSLT to fix DocBook for Texinfo
 - Documentation: sort sources for gitman.texi
 - Documentation: remove unneeded argument in cat-texi.perl
 - Documentation: modernize cat-texi.perl
 - Documentation: fix warning in cat-texi.perl

 Asciidoctor, an alternative reimplementation of AsciiDoc, still
 needs some changes to work with documents meant to be formatted
 with AsciiDoc.  "make USE_ASCIIDOCTOR=YesPlease" to use it out of
 the box to document our pages is getting closer to reality.


* ls/travis-p4-on-macos (2017-01-23) 1 commit
  (merged to 'next' on 2017-01-23 at 2d51987faa)
 + travis-ci: fix Perforce install on macOS

 Update the definition of the MacOSX test environment used by
 TravisCI.

 Will merge to 'master'.


* rs/qsort-s (2017-01-23) 5 commits
 - ref-filter: use QSORT_S in ref_array_sort()
 - string-list: use QSORT_S in string_list_sort()
 - perf: add basic sort performance test
 - add QSORT_S
 - compat: add qsort_s()

 A few codepaths had to rely on a global variable when sorting
 elements of an array because sort(3) API does not allow extra data
 to be passed to the comparison function.  Use qsort_s() when
 natively available, and a fallback implementation of it when not,
 to eliminate the need, which is a prerequisite for making the
 codepath reentrant.

 Will merge to 'next'.

--------------------------------------------------
[Stalled]

* jc/diff-b-m (2015-02-23) 5 commits
 . WIPWIP
 . WIP: diff-b-m
 - diffcore-rename: allow easier debugging
 - diffcore-rename.c: add locate_rename_src()
 - diffcore-break: allow debugging

 "git diff -B -M" produced incorrect patch when the postimage of a
 completely rewritten file is similar to the preimage of a removed
 file; such a resulting file must not be expressed as a rename from
 other place.

 The fix in this patch is broken, unfortunately.

 Will discard.

--------------------------------------------------
[Cooking]

* jk/coding-guidelines-update (2017-01-17) 1 commit
  (merged to 'next' on 2017-01-23 at 8c57afa288)
 + CodingGuidelines: clarify multi-line brace style

 Developer doc update.

 Will merge to 'master'.


* jk/fsck-connectivity-check-fix (2017-01-17) 6 commits
  (merged to 'next' on 2017-01-23 at e8e9b76b84)
 + fsck: check HAS_OBJ more consistently
 + fsck: do not fallback "git fsck <bogus>" to "git fsck"
 + fsck: tighten error-checks of "git fsck <head>"
 + fsck: prepare dummy objects for --connectivity-check
 + fsck: report trees as dangling
 + t1450: clean up sub-objects in duplicate-entry test

 "git fsck --connectivity-check" was not working at all.

 Will merge to 'master'.


* js/exec-path-coverity-workaround (2017-01-09) 2 commits
  (merged to 'next' on 2017-01-23 at bf5dfbf860)
 + git_exec_path: do not return the result of getenv()
 + git_exec_path: avoid Coverity warning about unfree()d result

 Code cleanup.

 Will merge to 'master'.
 Split out of another topic.


* js/mingw-isatty (2017-01-18) 1 commit
  (merged to 'next' on 2017-01-23 at ae0f80e058)
 + mingw: follow-up to "replace isatty() hack"

 An update to a topic that is already in 'master'.

 Will merge to 'master'.


* js/sequencer-i-countdown-3 (2017-01-17) 38 commits
 - sequencer (rebase -i): write out the final message
 - sequencer (rebase -i): write the progress into files
 - sequencer (rebase -i): show the progress
 - sequencer (rebase -i): suggest --edit-todo upon unknown command
 - sequencer (rebase -i): show only failed cherry-picks' output
 - sequencer (rebase -i): show only failed `git commit`'s output
 - sequencer: use run_command() directly
 - sequencer: update reading author-script
 - sequencer (rebase -i): differentiate between comments and 'noop'
 - sequencer (rebase -i): implement the 'drop' command
 - sequencer (rebase -i): allow rescheduling commands
 - sequencer (rebase -i): respect strategy/strategy_opts settings
 - sequencer (rebase -i): respect the rebase.autostash setting
 - sequencer (rebase -i): run the post-rewrite hook, if needed
 - sequencer (rebase -i): record interrupted commits in rewritten, too
 - sequencer (rebase -i): copy commit notes at end
 - sequencer (rebase -i): set the reflog message consistently
 - sequencer (rebase -i): refactor setting the reflog message
 - sequencer (rebase -i): allow fast-forwarding for edit/reword
 - sequencer (rebase -i): implement the 'reword' command
 - sequencer (rebase -i): leave a patch upon error
 - sequencer (rebase -i): update refs after a successful rebase
 - sequencer (rebase -i): the todo can be empty when continuing
 - sequencer (rebase -i): skip some revert/cherry-pick specific code path
 - sequencer (rebase -i): remove CHERRY_PICK_HEAD when no longer needed
 - sequencer (rebase -i): allow continuing with staged changes
 - sequencer (rebase -i): write an author-script file
 - sequencer (rebase -i): implement the short commands
 - sequencer (rebase -i): add support for the 'fixup' and 'squash' commands
 - sequencer (rebase -i): write the 'done' file
 - sequencer (rebase -i): learn about the 'verbose' mode
 - sequencer (rebase -i): implement the 'exec' command
 - sequencer (rebase -i): implement the 'edit' command
 - sequencer (rebase -i): implement the 'noop' command
 - sequencer: support a new action: 'interactive rebase'
 - sequencer: use a helper to find the commit message
 - sequencer: move "else" keyword onto the same line as preceding brace
 - sequencer: avoid unnecessary curly braces

 The sequencer machinery has been further enhanced so that a later
 set of patches can start using it to reimplement "rebase -i".

 Will merge to 'next'.
 I think I've said everything that needs to be said on this topic.


* jk/clear-delta-base-cache-fix (2017-01-19) 1 commit
  (merged to 'next' on 2017-01-23 at 5f4af2b0a5)
 + clear_delta_base_cache(): don't modify hashmap while iterating

 A crashing bug introduced in v2.11 timeframe has been found (it is
 triggerable only in fast-import) and fixed.

 Will merge to 'master'.


* jk/describe-omit-some-refs (2017-01-19) 6 commits
 - SQUASH???
 - describe: teach describe negative pattern matches
 - describe: teach --match to accept multiple patterns
 - name-rev: add support to exclude refs by pattern match
 - name-rev: extend --refs to accept multiple patterns
 - doc: add documentation for OPT_STRING_LIST

 "git describe" and "git name-rev" have been taught to take more
 than one refname patterns to restrict the set of refs to base their
 naming output on, and also learned to take negative patterns to
 name refs not to be used for naming via their "--exclude" option.

 Will merge to 'next' after making sure SQUASH??? is correctly
 squashed in.


* js/remote-rename-with-half-configured-remote (2017-01-19) 2 commits
  (merged to 'next' on 2017-01-23 at a1b655dbac)
 + remote rename: more carefully determine whether a remote is configured
 + remote rename: demonstrate a bogus "remote exists" bug

 With anticipatory tweaking for remotes defined in ~/.gitconfig
 (e.g. "remote.origin.prune" set to true, even though there may or
 may not actually be "origin" remote defined in a particular Git
 repository), "git remote rename" and other commands misinterpreted
 and behaved as if such a non-existing remote actually existed.

 Will merge to 'master'.


* sb/in-core-index-doc (2017-01-19) 4 commits
  (merged to 'next' on 2017-01-23 at 30224463e8)
 + documentation: retire unfinished documentation
 + cache.h: document add_[file_]to_index
 + cache.h: document remove_index_entry_at
 + cache.h: document index_name_pos

 Documentation and in-code comments updates.

 Will merge to 'master'.


* sb/retire-convert-objects-from-contrib (2017-01-19) 1 commit
  (merged to 'next' on 2017-01-23 at decc1e237d)
 + contrib: remove git-convert-objects

 Remove an ancient tool left in contrib/.

 Will merge to 'master'.


* ep/commit-static-buf-cleanup (2017-01-13) 2 commits
 - builtin/commit.c: switch to xstrfmt(), instead of snprintf()
 - builtin/commit.c: remove the PATH_MAX limitation via dynamic allocation

 Code clean-up.

 Expecting a reroll.
 The tip one would instead be done with strbuf.
 cf. <CA+EOSB=4-TKpi6mr-yVbwRsFrVzE=vo4Y9Qm3VMm7pn=UB1_hQ@mail.gmail.com>


* jk/vreport-sanitize (2017-01-11) 2 commits
  (merged to 'next' on 2017-01-18 at 4bbf370981)
 + vreport: sanitize ASCII control chars
 + Revert "vreportf: avoid intermediate buffer"

 An error message with an ASCII control character like '\r' in it
 can alter the message to hide its early part, which is problematic
 when a remote side gives such an error message that the local side
 will relay with a "remote: " prefix.

 Will merge to 'master'.


* sb/unpack-trees-super-prefix (2017-01-12) 5 commits
 - SQUASH
 - unpack-trees: support super-prefix option
 - t1001: modernize style
 - t1000: modernize style
 - read-tree: use OPT_BOOL instead of OPT_SET_INT

 "git read-tree" and its underlying unpack_trees() machinery learned
 to report problematic paths prefixed with the --super-prefix option.

 Expecting a reroll.
 The first three are in good shape.  The last one needs a better
 explanation and possibly an update to its test.
 cf. <CAGZ79kaHDnVLQr8WLoaD5KKs7EqeW=KbkptF=iHpU5t054Xcdw@mail.gmail.com>


* sb/submodule-doc (2017-01-12) 3 commits
 - submodules: add a background story
 - submodule update documentation: don't repeat ourselves
 - submodule documentation: add options to the subcommand

 Needs review.


* bw/attr (2017-01-23) 27 commits
 - attr: reformat git_attr_set_direction() function
 - attr: push the bare repo check into read_attr()
 - attr: store attribute stack in attr_check structure
 - attr: tighten const correctness with git_attr and match_attr
 - attr: remove maybe-real, maybe-macro from git_attr
 - attr: eliminate global check_all_attr array
 - attr: use hashmap for attribute dictionary
 - attr: change validity check for attribute names to use positive logic
 - attr: pass struct attr_check to collect_some_attrs
 - attr: retire git_check_attrs() API
 - attr: convert git_check_attrs() callers to use the new API
 - attr: convert git_all_attrs() to use "struct attr_check"
 - attr: (re)introduce git_check_attr() and struct attr_check
 - attr: rename function and struct related to checking attributes
 - attr.c: outline the future plans by heavily commenting
 - Documentation/gitattributes.txt: fix a typo
 - attr.c: add push_stack() helper
 - attr: support quoting pathname patterns in C style
 - attr.c: plug small leak in parse_attr_line()
 - attr.c: tighten constness around "git_attr" structure
 - attr.c: simplify macroexpand_one()
 - attr.c: mark where #if DEBUG ends more clearly
 - attr.c: complete a sentence in a comment
 - attr.c: explain the lack of attr-name syntax check in parse_attr()
 - attr.c: update a stale comment on "struct match_attr"
 - attr.c: use strchrnul() to scan for one line
 - commit.c: use strchrnul() to scan for one line

 The gitattributes machinery is being taught to work better in a
 multi-threaded environment.

 Expecting a reroll.


* jk/loose-object-fsck (2017-01-15) 6 commits
  (merged to 'next' on 2017-01-23 at 4302ad090d)
 + fsck: detect trailing garbage in all object types
 + fsck: parse loose object paths directly
 + sha1_file: add read_loose_object() function
 + t1450: test fsck of packed objects
 + sha1_file: fix error message for alternate objects
 + t1450: refactor loose-object removal

 "git fsck" inspects loose objects more carefully now.

 Will merge to 'master'.


* vn/xdiff-func-context (2017-01-15) 1 commit
 - xdiff -W: relax end-of-file function detection

 "git diff -W" has been taught to handle the case where a new
 function is added at the end of the file better.

 Will hold.
 An follow-up change to go back from the line that matches the
 funcline to show comments before the function definition is still
 being discussed.


* mh/ref-remove-empty-directory (2017-01-07) 23 commits
 - files_transaction_commit(): clean up empty directories
 - try_remove_empty_parents(): teach to remove parents of reflogs, too
 - try_remove_empty_parents(): don't trash argument contents
 - try_remove_empty_parents(): rename parameter "name" -> "refname"
 - delete_ref_loose(): inline function
 - delete_ref_loose(): derive loose reference path from lock
 - log_ref_write_1(): inline function
 - log_ref_setup(): manage the name of the reflog file internally
 - log_ref_write_1(): don't depend on logfile argument
 - log_ref_setup(): pass the open file descriptor back to the caller
 - log_ref_setup(): improve robustness against races
 - log_ref_setup(): separate code for create vs non-create
 - log_ref_write(): inline function
 - rename_tmp_log(): improve error reporting
 - rename_tmp_log(): use raceproof_create_file()
 - lock_ref_sha1_basic(): use raceproof_create_file()
 - lock_ref_sha1_basic(): inline constant
 - raceproof_create_file(): new function
 - safe_create_leading_directories(): set errno on SCLD_EXISTS
 - safe_create_leading_directories_const(): preserve errno
 - t5505: use "for-each-ref" to test for the non-existence of references
 - refname_is_safe(): correct docstring
 - files_rename_ref(): tidy up whitespace

 Deletion of a branch "foo/bar" could remove .git/refs/heads/foo
 once there no longer is any other branch whose name begins with
 "foo/", but we didn't do so so far.  Now we do.

 Expecting a reroll.
 cf. <5051c78e-51f9-becd-e1a6-9c0b781d6912@alum.mit.edu>


* ls/filter-process-delayed (2017-01-08) 1 commit
 . convert: add "status=delayed" to filter process protocol

 Ejected, as does not build when merged to 'pu'.


* nd/worktree-move (2017-01-09) 6 commits
 - worktree remove: new command
 - worktree move: refuse to move worktrees with submodules
 - worktree move: accept destination as directory
 - worktree move: new command
 - worktree.c: add update_worktree_location()
 - worktree.c: add validate_worktree()

 "git worktree" learned move and remove subcommands.

 Needs review.


* nd/log-graph-configurable-colors (2017-01-19) 4 commits
 - SQUASH???
 - log --graph: customize the graph lines with config log.graphColors
 - color.c: trim leading spaces in color_parse_mem()
 - color.c: fix color_parse_mem() with value_len == 0

 Some people feel the default set of colors used by "git log --graph"
 rather limiting.  A mechanism to customize the set of colors has
 been introduced.

 Will merge to 'next' after making sure SQUASH??? is correctly
 squashed in.


* cc/split-index-config (2016-12-26) 21 commits
 - Documentation/git-update-index: explain splitIndex.*
 - Documentation/config: add splitIndex.sharedIndexExpire
 - read-cache: use freshen_shared_index() in read_index_from()
 - read-cache: refactor read_index_from()
 - t1700: test shared index file expiration
 - read-cache: unlink old sharedindex files
 - config: add git_config_get_expiry() from gc.c
 - read-cache: touch shared index files when used
 - sha1_file: make check_and_freshen_file() non static
 - Documentation/config: add splitIndex.maxPercentChange
 - t1700: add tests for splitIndex.maxPercentChange
 - read-cache: regenerate shared index if necessary
 - config: add git_config_get_max_percent_split_change()
 - Documentation/git-update-index: talk about core.splitIndex config var
 - Documentation/config: add information for core.splitIndex
 - t1700: add tests for core.splitIndex
 - update-index: warn in case of split-index incoherency
 - read-cache: add and then use tweak_split_index()
 - split-index: add {add,remove}_split_index() functions
 - config: add git_config_get_split_index()
 - config: mark an error message up for translation

 The experimental "split index" feature has gained a few
 configuration variables to make it easier to use.

 Waiting for review comments to be addressed.
 cf. <20161226102222.17150-1-chriscool@tuxfamily.org>
 cf. <a1a44640-ff6c-2294-72ac-46322eff8505@ramsayjones.plus.com>


* bw/push-submodule-only (2016-12-20) 3 commits
  (merged to 'next' on 2017-01-23 at d6cd1c60ae)
 + push: add option to push only submodules
 + submodules: add RECURSE_SUBMODULES_ONLY value
 + transport: reformat flag #defines to be more readable

 "git submodule push" learned "--recurse-submodules=only option to
 push submodules out without pushing the top-level superproject.

 Will merge to 'master'.


* ls/p4-path-encoding (2016-12-18) 1 commit
 - git-p4: fix git-p4.pathEncoding for removed files

 When "git p4" imports changelist that removes paths, it failed to
 convert pathnames when the p4 used encoding different from the one
 used on the Git side.  This has been corrected.

 Will be rerolled.
 cf. <7E1C7387-4F37-423F-803D-3B5690B49D40@gmail.com>


* js/difftool-builtin (2017-01-19) 3 commits
  (merged to 'next' on 2017-01-23 at 6f4810dbd9)
 + difftool: retire the scripted version
 + difftool: implement the functionality in the builtin
 + difftool: add a skeleton for the upcoming builtin

 Rewrite a scripted porcelain "git difftool" in C.

 Will merge to 'master'.


* sb/push-make-submodule-check-the-default (2016-11-29) 2 commits
  (merged to 'next' on 2016-12-12 at 1863e05af5)
 + push: change submodule default to check when submodules exist
 + submodule add: extend force flag to add existing repos

 Turn the default of "push.recurseSubmodules" to "check" when
 submodules seem to be in use.

 Will cook in 'next'.


* kn/ref-filter-branch-list (2017-01-10) 21 commits
 - SQUASH???
 - branch: implement '--format' option
 - branch: use ref-filter printing APIs
 - branch, tag: use porcelain output
 - ref-filter: allow porcelain to translate messages in the output
 - ref-filter: add an 'rstrip=<N>' option to atoms which deal with refnames
 - ref-filter: modify the 'lstrip=<N>' option to work with negative '<N>'
 - ref-filter: Do not abruptly die when using the 'lstrip=<N>' option
 - ref-filter: rename the 'strip' option to 'lstrip'
 - ref-filter: make remote_ref_atom_parser() use refname_atom_parser_internal()
 - ref-filter: introduce refname_atom_parser()
 - ref-filter: introduce refname_atom_parser_internal()
 - ref-filter: make "%(symref)" atom work with the ':short' modifier
 - ref-filter: add support for %(upstream:track,nobracket)
 - ref-filter: make %(upstream:track) prints "[gone]" for invalid upstreams
 - ref-filter: introduce format_ref_array_item()
 - ref-filter: move get_head_description() from branch.c
 - ref-filter: modify "%(objectname:short)" to take length
 - ref-filter: implement %(if:equals=<string>) and %(if:notequals=<string>)
 - ref-filter: include reference to 'used_atom' within 'atom_value'
 - ref-filter: implement %(if), %(then), and %(else) atoms

 The code to list branches in "git branch" has been consolidated
 with the more generic ref-filter API.

 I think this is almost ready.  Will wait for a few days, squash
 fixes in if needed and merge to 'next'.


* jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit
  (merged to 'next' on 2016-12-05 at 0c77e39cd5)
 + setup_git_env: avoid blind fall-back to ".git"

 Originally merged to 'next' on 2016-10-26

 This is the endgame of the topic to avoid blindly falling back to
 ".git" when the setup sequence said we are _not_ in Git repository.
 A corner case that happens to work right now may be broken by a
 call to die("BUG").

 Will cook in 'next'.


* pb/bisect (2016-10-18) 27 commits
 - bisect--helper: remove the dequote in bisect_start()
 - bisect--helper: retire `--bisect-auto-next` subcommand
 - bisect--helper: retire `--bisect-autostart` subcommand
 - bisect--helper: retire `--bisect-write` subcommand
 - bisect--helper: `bisect_replay` shell function in C
 - bisect--helper: `bisect_log` shell function in C
 - bisect--helper: retire `--write-terms` subcommand
 - bisect--helper: retire `--check-expected-revs` subcommand
 - bisect--helper: `bisect_state` & `bisect_head` shell function in C
 - bisect--helper: `bisect_autostart` shell function in C
 - bisect--helper: retire `--next-all` subcommand
 - bisect--helper: retire `--bisect-clean-state` subcommand
 - bisect--helper: `bisect_next` and `bisect_auto_next` shell function in C
 - t6030: no cleanup with bad merge base
 - bisect--helper: `bisect_start` shell function partially in C
 - bisect--helper: `get_terms` & `bisect_terms` shell function in C
 - bisect--helper: `bisect_next_check` & bisect_voc shell function in C
 - bisect--helper: `check_and_set_terms` shell function in C
 - bisect--helper: `bisect_write` shell function in C
 - bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C
 - bisect--helper: `bisect_reset` shell function in C
 - wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()
 - t6030: explicitly test for bisection cleanup
 - bisect--helper: `bisect_clean_state` shell function in C
 - bisect--helper: `write_terms` shell function in C
 - bisect: rewrite `check_term_format` shell function in C
 - bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

 Move more parts of "git bisect" to C.

 Expecting a reroll.
 cf. <CAFZEwPPXPPHi8KiEGS9ggzNHDCGhuqMgH9Z8-Pf9GLshg8+LPA@mail.gmail.com>
 cf. <CAFZEwPM9RSTGN54dzaw9gO9iZmsYjJ_d1SjUD4EzSDDbmh-XuA@mail.gmail.com>


* st/verify-tag (2017-01-18) 6 commits
  (merged to 'next' on 2017-01-23 at 2810959427)
 + t/t7004-tag: Add --format specifier tests
 + t/t7030-verify-tag: Add --format specifier tests
 + builtin/tag: add --format argument for tag -v
 + builtin/verify-tag: add --format to verify-tag
 + ref-filter: add function to print single ref_array_item
 + gpg-interface, tag: add GPG_VERIFY_OMIT_STATUS flag

 "git tag" and "git verify-tag" learned to put GPG verification
 status in their "--format=<placeholders>" output format.

 Will merge to 'master'.


* jc/merge-drop-old-syntax (2015-04-29) 1 commit
  (merged to 'next' on 2016-12-05 at 041946dae0)
 + merge: drop 'git merge <message> HEAD <commit>' syntax

 Originally merged to 'next' on 2016-10-11

 Stop supporting "git merge <message> HEAD <commit>" syntax that has
 been deprecated since October 2007, and issues a deprecation
 warning message since v2.5.0.

 Will cook in 'next'.

--------------------------------------------------
[Discarded]

* jc/bundle (2016-03-03) 6 commits
 . index-pack: --clone-bundle option
 . Merge branch 'jc/index-pack' into jc/bundle
 . bundle v3: the beginning
 . bundle: keep a copy of bundle file name in the in-core bundle header
 . bundle: plug resource leak
 . bundle doc: 'verify' is not about verifying the bundle

 The beginning of "split bundle", which could be one of the
 ingredients to allow "git clone" traffic off of the core server
 network to CDN.

 While I think it would make it easier for people to experiment and
 build on if the topic is merged to 'next', I am at the same time a
 bit reluctant to merge an unproven new topic that introduces a new
 file format, which we may end up having to support til the end of
 time.  It is likely that to support a "prime clone from CDN", it
 would need a lot more than just "these are the heads and the pack
 data is over there", so this may not be sufficient.


* jk/nofollow-attr-ignore (2016-11-02) 5 commits
 . exclude: do not respect symlinks for in-tree .gitignore
 . attr: do not respect symlinks for in-tree .gitattributes
 . exclude: convert "check_index" into a flags field
 . attr: convert "macro_ok" into a flags field
 . add open_nofollow() helper

 As we do not follow symbolic links when reading control files like
 .gitignore and .gitattributes from the index, match the behaviour
 and not follow symbolic links when reading them from the working
 tree.  This also tightens security a bit by not leaking contents of
 an unrelated file in the error messages when it is pointed at by
 one of these files that is a symbolic link.

 Perhaps we want to cover .gitmodules too with the same mechanism?

^ permalink raw reply

* Re: [PATCH v2 0/5] string-list: make string_list_sort() reentrant
From: Jeff King @ 2017-01-23 23:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Johannes Schindelin
In-Reply-To: <67ac53cd-3fc0-8bd0-30f4-129281c3090f@web.de>

On Sun, Jan 22, 2017 at 06:47:18PM +0100, René Scharfe wrote:

> Use qsort_s() from C11 Annex K to make string_list_sort() safer, in
> particular when called from parallel threads.
> 
> Changes from v1:
> * Renamed HAVE_QSORT_S to HAVE_ISO_QSORT_S in Makefile to disambiguate.
> * Added basic perf test (patch 3).
> * Converted a second caller to QSORT_S, in ref-filter.c (patch 5).

This looks nicely done overall, and I appreciate the perf test.

The speed looks like a reasonable outcome. I'm torn on the qsort_r()
demo patch. I don't think it looks too bad. OTOH, I don't think I would
want to deal with the opposite-argument-order versions.

Is there any interest in people adding the ISO qsort_s() to their libc
implementations? It seems like it's been a fair number of years by now.

-Peff

^ permalink raw reply

* Re: [PATCH] rebase: pass --signoff option to git am
From: Junio C Hamano @ 2017-01-23 23:27 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: Git List
In-Reply-To: <CAOxFTczrLmWZg3720HMUA-13q9ADi_rK5k0x+TEYyKR=xR5b_w@mail.gmail.com>

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:

> On Mon, Jan 23, 2017 at 9:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:
>>
>>> On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>>
>>>> Should we plan to extend this to the interactive backend that is
>>>> shared between rebase -i and rebase -m, too?  Or is this patch
>>>> already sufficient to cover them?
>>>
>>> AFAIK this is sufficient for both, in the sense that I've used it with
>>> git rebase -i and it works.
>>
>> That is a good news and at the same time a bit awkard one ;-)
>>
>> The mention of "passed to 'git am'" twice in the documentation and
>> help text would lead people to think "rebase -i" would not be
>> affected and (1) would need more work to do so, or (2) the user does
>> not want "rebase -i" to be unaffected for whatever reason, and gets
>> surprised to see that it actually does get affected.
>
> I'm not sure I follow. If the user doesn't want to signoff during a
> rebase, they can simply not pass --signoff. If they do, they can not
> pass it. Am I missing something?

alias.

Which also means that there needs to be --no-signoff option that can
be given to countermand an earlier --signoff, if a user did

	[alias] rb = rebase --signoff

and wants to disable it one time only with

	$ git rb --no-signoff

>
>> In any case, will queue as-is so that we won't lose the patch while
>> waiting for people to raise their opinions.
>
> Thanks.

Thanks.  The final version would also need tests, so it may be a
good time to start thinking about what aspect of this feature wants
to be protected against future breakages.

^ permalink raw reply

* [PATCH 2/5] sequencer: save/load all options
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta
In-Reply-To: <20170123225221.3659-1-giuseppe.bilotta@gmail.com>

Add the missing replay_opts to save_opts and populate_opts, so that an
interrupted cherry-pick will continue with the same setup it had before
the interruption.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 sequencer.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 672c81b559..3d2f61c979 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -985,6 +985,14 @@ static int populate_opts_cb(const char *key, const char *value, void *data)
 		opts->signoff = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.allow-ff"))
 		opts->allow_ff = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.rerere-autoupdate"))
+		opts->allow_rerere_auto = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.allow-empty"))
+		opts->allow_empty = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.allow-empty-message"))
+		opts->allow_empty_message = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.keep-redundant-commits"))
+		opts->keep_redundant_commits = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.mainline"))
 		opts->mainline = git_config_int(key, value);
 	else if (!strcmp(key, "options.gpg-sign"))
@@ -1233,6 +1241,14 @@ static int save_opts(struct replay_opts *opts)
 		res |= git_config_set_in_file_gently(opts_file, "options.signoff", "true");
 	if (opts->allow_ff)
 		res |= git_config_set_in_file_gently(opts_file, "options.allow-ff", "true");
+	if (opts->allow_rerere_auto)
+		res |= git_config_set_in_file_gently(opts_file, "options.rerere-autoupdate", "true");
+	if (opts->allow_empty)
+		res |= git_config_set_in_file_gently(opts_file, "options.allow-empty", "true");
+	if (opts->allow_empty_message)
+		res |= git_config_set_in_file_gently(opts_file, "options.allow-empty-message", "true");
+	if (opts->keep_redundant_commits)
+		res |= git_config_set_in_file_gently(opts_file, "options.keep-redundant-commits", "true");
 	if (opts->mainline) {
 		struct strbuf buf = STRBUF_INIT;
 		strbuf_addf(&buf, "%d", opts->mainline);
-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply related

* [PATCH 4/5] cherry-pick: allow skipping only redundant commits
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta
In-Reply-To: <20170123225221.3659-1-giuseppe.bilotta@gmail.com>

This allows the preservation of originally empty commits with the
combination of flags --allow-empty --skip-redundant-commits.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 Documentation/git-cherry-pick.txt |  8 ++++-
 builtin/revert.c                  | 18 +++++++++++-
 sequencer.c                       | 62 ++++++++++++++++++++++++++++++---------
 sequencer.h                       |  1 +
 4 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-cherry-pick.txt b/Documentation/git-cherry-pick.txt
index ffced816d6..147e0cde0c 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -138,9 +138,15 @@ effect to your index in a row.
 	examine the commit. This option overrides that behavior and
 	creates an empty commit object.  Implies `--allow-empty`.
 
+--skip-redundant-commits::
+	Redundant commits will be skipped altogether. This does not
+	influence commits that were originally empty (see
+	`--allow-empty` and `--skip-empty`).
+
 --skip-empty::
 	This option causes empty and redundant cherry-picked commits to
-	be skipped without requesting the user intervention.
+	be skipped without requesting the user intervention. Implies
+	`--skip-redundant-commits`.
 
 --strategy=<strategy>::
 	Use the given merge strategy.  Should only be used once.
diff --git a/builtin/revert.c b/builtin/revert.c
index ffdd367f99..aca8a1d9d0 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -102,7 +102,8 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 			OPT_BOOL(0, "allow-empty", &opts->allow_empty, N_("preserve initially empty commits")),
 			OPT_BOOL(0, "allow-empty-message", &opts->allow_empty_message, N_("allow commits with empty messages")),
 			OPT_BOOL(0, "keep-redundant-commits", &opts->keep_redundant_commits, N_("keep redundant, empty commits")),
-			OPT_BOOL(0, "skip-empty", &opts->skip_empty, N_("skip redundant, empty commits")),
+			OPT_BOOL(0, "skip-empty", &opts->skip_empty, N_("skip both redundant and initially empty commits")),
+			OPT_BOOL(0, "skip-redundant-commits", &opts->skip_redundant_commits, N_("skip redundant commits")),
 			OPT_END(),
 		};
 		options = parse_options_concat(options, cp_extra);
@@ -115,6 +116,9 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 	/* implies allow_empty */
 	if (opts->keep_redundant_commits)
 		opts->allow_empty = 1;
+	/* implies skip_redundant_commits */
+	if (opts->skip_empty)
+		opts->skip_redundant_commits = 1;
 
 	/* Check for incompatible command line arguments */
 	if (cmd) {
@@ -147,6 +151,18 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 				"--edit", opts->edit,
 				NULL);
 
+	if (opts->keep_redundant_commits)
+		verify_opt_compatible(me, "--keep-redundant-commits",
+				"--skip-empty", opts->skip_empty,
+				"--skip-redundant-commits", opts->skip_redundant_commits,
+				NULL);
+
+	if (opts->keep_redundant_commits)
+		verify_opt_compatible(me, "--skip-empty",
+				"--allow-empty", opts->allow_empty,
+				"--keep-redundant-commits", opts->keep_redundant_commits,
+				NULL);
+
 	if (cmd) {
 		opts->revs = NULL;
 	} else {
diff --git a/sequencer.c b/sequencer.c
index 9c01310162..333d9112de 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -563,40 +563,70 @@ static int allow_or_skip_empty(struct replay_opts *opts, struct commit *commit)
 {
 	int index_unchanged, empty_commit;
 
-	/*
-	 * Four cases:
+	/* We have four options:
 	 *
-	 * (1) we do not allow empty at all and error out;
+	 * --allow-empty (AE)
+	 * --keep-redundant-commits (KR)
+	 * --skip-empty (SE)
+	 * --skip-redundant-commits (SR)
 	 *
-	 * (2) we skip empty commits altogether;
+	 * Additionally, if KR, then AE. And if SE, then SR.
+	 * 
+	 * We have three possible states:
+	 * Not Empty (NE)
+	 * Originally Empty (OE)
+	 * made REdundant (RE) (originally not empty)
 	 *
-	 * (3) we allow ones that were initially empty, but
-	 * forbid the ones that become empty;
+	 * NE always gets committed. The other two depend on the combination
+	 * of flags.
 	 *
-	 * (4) we allow both.
+	 *              OE outcome | RE outcome | AE  KR  SE  SR
+	 *     Case 0:  0 (error)    0 (error)     0   0   0   0
+	 *     Case 1:  1 (allow)    0 (error)     1   0   0   0
+	 * N/A Case 2:  2 (skip)     0 (error)     0   0   1   0
+	 * N/A Case 3:  0 (error)    1 (keep)      0   1   0   0
+	 *     Case 4:  1 (allow)    1 (keep)      1   1   0   0
+	 * N/A Case 5:  2 (skip)     1 (keep)      0   1   1   0
+	 *     Case 6:  0 (error)    2 (skip)      0   0   0   1
+	 *     Case 7:  1 (allow)    2 (skip)      1   0   0   1
+	 *     Case 8:  2 (skip )    2 (skip)      0   0   1   1
+	 *
+	 * TODO should we allow Case 2? If so, how?
 	 */
-	if (!opts->allow_empty && !opts->skip_empty)
+
+	/* Case 0 */
+	if (!opts->allow_empty && !opts->skip_redundant_commits)
 		return 0; /* let "git commit" barf as necessary */
 
 	index_unchanged = is_index_unchanged();
 	if (index_unchanged < 0)
 		return index_unchanged;
+
 	if (!index_unchanged)
 		return 0; /* we do not have to say --allow-empty */
 
-	if (opts->skip_empty)
-		return 2;
+	/* Here we know that the commit is either OE or RE */
 
+	/* Case 4, we don't care, result is 'allow' for both cases */
 	if (opts->keep_redundant_commits)
 		return 1;
 
+	/* Case 8, we don't care, result is 'skip' for both cases */
+	if (opts->skip_empty)
+		return 2;
+
+	/* Now we must differentiate between OE and RE,
+	 * for Case 1, 6, 7 */
 	empty_commit = is_original_commit_empty(commit);
 	if (empty_commit < 0)
 		return empty_commit;
-	if (!empty_commit)
-		return 0;
-	else
-		return 1;
+
+	/* An OE will return 1 if AE, 0 otherwise */
+	if (empty_commit)
+		return opts->allow_empty;
+
+	/* An RE will return 2 if SR, 0 otherwise */
+	return 2*opts->skip_redundant_commits;
 }
 
 enum todo_command {
@@ -1009,6 +1039,8 @@ static int populate_opts_cb(const char *key, const char *value, void *data)
 		opts->keep_redundant_commits = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.skip-empty"))
 		opts->skip_empty = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.skip-redundant-commits"))
+		opts->skip_redundant_commits = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.mainline"))
 		opts->mainline = git_config_int(key, value);
 	else if (!strcmp(key, "options.gpg-sign"))
@@ -1267,6 +1299,8 @@ static int save_opts(struct replay_opts *opts)
 		res |= git_config_set_in_file_gently(opts_file, "options.keep-redundant-commits", "true");
 	if (opts->skip_empty)
 		res |= git_config_set_in_file_gently(opts_file, "options.skip-empty", "true");
+	if (opts->skip_redundant_commits)
+		res |= git_config_set_in_file_gently(opts_file, "options.skip-redundant-commits", "true");
 	if (opts->mainline) {
 		struct strbuf buf = STRBUF_INIT;
 		strbuf_addf(&buf, "%d", opts->mainline);
diff --git a/sequencer.h b/sequencer.h
index c747cfcfc7..f8b8bd0063 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -24,6 +24,7 @@ struct replay_opts {
 	int allow_empty_message;
 	int keep_redundant_commits;
 	int skip_empty;
+	int skip_redundant_commits;;
 
 	int mainline;
 
-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply related

* [PATCH 1/5] sequencer: sort options load/save by struct position
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta
In-Reply-To: <20170123225221.3659-1-giuseppe.bilotta@gmail.com>

No functional change. The order in which options are serialized and
reloaded is now the same in which they appear in the replay_opts
structure. This makes it easier to spot when we forget to
serialize/reload an option value.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 sequencer.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 9adb7bbf1d..672c81b559 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -975,22 +975,22 @@ static int populate_opts_cb(const char *key, const char *value, void *data)
 
 	if (!value)
 		error_flag = 0;
-	else if (!strcmp(key, "options.no-commit"))
-		opts->no_commit = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.edit"))
 		opts->edit = git_config_bool_or_int(key, value, &error_flag);
-	else if (!strcmp(key, "options.signoff"))
-		opts->signoff = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.record-origin"))
 		opts->record_origin = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.no-commit"))
+		opts->no_commit = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.signoff"))
+		opts->signoff = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.allow-ff"))
 		opts->allow_ff = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.mainline"))
 		opts->mainline = git_config_int(key, value);
-	else if (!strcmp(key, "options.strategy"))
-		git_config_string_dup(&opts->strategy, key, value);
 	else if (!strcmp(key, "options.gpg-sign"))
 		git_config_string_dup(&opts->gpg_sign, key, value);
+	else if (!strcmp(key, "options.strategy"))
+		git_config_string_dup(&opts->strategy, key, value);
 	else if (!strcmp(key, "options.strategy-option")) {
 		ALLOC_GROW(opts->xopts, opts->xopts_nr + 1, opts->xopts_alloc);
 		opts->xopts[opts->xopts_nr++] = xstrdup(value);
@@ -1223,14 +1223,14 @@ static int save_opts(struct replay_opts *opts)
 	const char *opts_file = git_path_opts_file();
 	int res = 0;
 
-	if (opts->no_commit)
-		res |= git_config_set_in_file_gently(opts_file, "options.no-commit", "true");
 	if (opts->edit)
 		res |= git_config_set_in_file_gently(opts_file, "options.edit", "true");
-	if (opts->signoff)
-		res |= git_config_set_in_file_gently(opts_file, "options.signoff", "true");
 	if (opts->record_origin)
 		res |= git_config_set_in_file_gently(opts_file, "options.record-origin", "true");
+	if (opts->no_commit)
+		res |= git_config_set_in_file_gently(opts_file, "options.no-commit", "true");
+	if (opts->signoff)
+		res |= git_config_set_in_file_gently(opts_file, "options.signoff", "true");
 	if (opts->allow_ff)
 		res |= git_config_set_in_file_gently(opts_file, "options.allow-ff", "true");
 	if (opts->mainline) {
@@ -1239,10 +1239,10 @@ static int save_opts(struct replay_opts *opts)
 		res |= git_config_set_in_file_gently(opts_file, "options.mainline", buf.buf);
 		strbuf_release(&buf);
 	}
+	if (opts->gpg_sign)
+		res |= git_config_set_in_file_gently(opts_file, "options.gpg-sign", opts->gpg_sign);
 	if (opts->strategy)
 		res |= git_config_set_in_file_gently(opts_file, "options.strategy", opts->strategy);
-	if (opts->gpg_sign)
-		res |= git_config_set_in_file_gently(opts_file, "options.gpg-sign", opts->gpg_sign);
 	if (opts->xopts) {
 		int i;
 		for (i = 0; i < opts->xopts_nr; i++)
-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply related

* [PATCH 3/5] cherry-pick: option to skip empty commits
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta
In-Reply-To: <20170123225221.3659-1-giuseppe.bilotta@gmail.com>

This allows cherry-picking a set of commits, some of which may be
redundant, without stopping to ask for the user intervention.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 Documentation/git-cherry-pick.txt |  4 ++++
 builtin/revert.c                  |  1 +
 sequencer.c                       | 45 +++++++++++++++++++++++++++++++--------
 sequencer.h                       |  1 +
 4 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-cherry-pick.txt b/Documentation/git-cherry-pick.txt
index d35d771fc8..ffced816d6 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -138,6 +138,10 @@ effect to your index in a row.
 	examine the commit. This option overrides that behavior and
 	creates an empty commit object.  Implies `--allow-empty`.
 
+--skip-empty::
+	This option causes empty and redundant cherry-picked commits to
+	be skipped without requesting the user intervention.
+
 --strategy=<strategy>::
 	Use the given merge strategy.  Should only be used once.
 	See the MERGE STRATEGIES section in linkgit:git-merge[1]
diff --git a/builtin/revert.c b/builtin/revert.c
index 4ca5b51544..ffdd367f99 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -102,6 +102,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 			OPT_BOOL(0, "allow-empty", &opts->allow_empty, N_("preserve initially empty commits")),
 			OPT_BOOL(0, "allow-empty-message", &opts->allow_empty_message, N_("allow commits with empty messages")),
 			OPT_BOOL(0, "keep-redundant-commits", &opts->keep_redundant_commits, N_("keep redundant, empty commits")),
+			OPT_BOOL(0, "skip-empty", &opts->skip_empty, N_("skip redundant, empty commits")),
 			OPT_END(),
 		};
 		options = parse_options_concat(options, cp_extra);
diff --git a/sequencer.c b/sequencer.c
index 3d2f61c979..9c01310162 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -550,22 +550,32 @@ static int is_original_commit_empty(struct commit *commit)
 
 /*
  * Do we run "git commit" with "--allow-empty"?
+ *
+ * Or do we just skip this empty commit?
+ *
+ * Returns 1 if a commit should be done with --allow-empty,
+ *         0 if a commit should be done without --allow-empty,
+ *         2 if no commit should be done at all (skip empty commit)
+ *         negative values in case of error
+ *
  */
-static int allow_empty(struct replay_opts *opts, struct commit *commit)
+static int allow_or_skip_empty(struct replay_opts *opts, struct commit *commit)
 {
 	int index_unchanged, empty_commit;
 
 	/*
-	 * Three cases:
+	 * Four cases:
 	 *
-	 * (1) we do not allow empty at all and error out.
+	 * (1) we do not allow empty at all and error out;
 	 *
-	 * (2) we allow ones that were initially empty, but
+	 * (2) we skip empty commits altogether;
+	 *
+	 * (3) we allow ones that were initially empty, but
 	 * forbid the ones that become empty;
 	 *
-	 * (3) we allow both.
+	 * (4) we allow both.
 	 */
-	if (!opts->allow_empty)
+	if (!opts->allow_empty && !opts->skip_empty)
 		return 0; /* let "git commit" barf as necessary */
 
 	index_unchanged = is_index_unchanged();
@@ -574,6 +584,9 @@ static int allow_empty(struct replay_opts *opts, struct commit *commit)
 	if (!index_unchanged)
 		return 0; /* we do not have to say --allow-empty */
 
+	if (opts->skip_empty)
+		return 2;
+
 	if (opts->keep_redundant_commits)
 		return 1;
 
@@ -612,7 +625,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
 	const char *base_label, *next_label;
 	struct commit_message msg = { NULL, NULL, NULL, NULL };
 	struct strbuf msgbuf = STRBUF_INIT;
-	int res, unborn = 0, allow;
+	int res = 0, unborn = 0, allow;
 
 	if (opts->no_commit) {
 		/*
@@ -771,12 +784,13 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
 		goto leave;
 	}
 
-	allow = allow_empty(opts, commit);
+	allow = allow_or_skip_empty(opts, commit);
 	if (allow < 0) {
 		res = allow;
 		goto leave;
 	}
-	if (!opts->no_commit)
+	/* allow == 2 means skip this commit */
+	if (allow != 2 && !opts->no_commit)
 		res = run_git_commit(opts->edit ? NULL : git_path_merge_msg(),
 				     opts, allow, opts->edit, 0, 0);
 
@@ -993,6 +1007,8 @@ static int populate_opts_cb(const char *key, const char *value, void *data)
 		opts->allow_empty_message = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.keep-redundant-commits"))
 		opts->keep_redundant_commits = git_config_bool_or_int(key, value, &error_flag);
+	else if (!strcmp(key, "options.skip-empty"))
+		opts->skip_empty = git_config_bool_or_int(key, value, &error_flag);
 	else if (!strcmp(key, "options.mainline"))
 		opts->mainline = git_config_int(key, value);
 	else if (!strcmp(key, "options.gpg-sign"))
@@ -1249,6 +1265,8 @@ static int save_opts(struct replay_opts *opts)
 		res |= git_config_set_in_file_gently(opts_file, "options.allow-empty-message", "true");
 	if (opts->keep_redundant_commits)
 		res |= git_config_set_in_file_gently(opts_file, "options.keep-redundant-commits", "true");
+	if (opts->skip_empty)
+		res |= git_config_set_in_file_gently(opts_file, "options.skip-empty", "true");
 	if (opts->mainline) {
 		struct strbuf buf = STRBUF_INIT;
 		strbuf_addf(&buf, "%d", opts->mainline);
@@ -1322,6 +1340,14 @@ int sequencer_continue(struct replay_opts *opts)
 	if ((res = read_populate_todo(&todo_list, opts)))
 		goto release_todo_list;
 
+	/* check if there is something to commit */
+	res = is_index_unchanged();
+	if (res < 0)
+		goto release_todo_list;
+
+	if (res && opts->skip_empty)
+		goto skip_this_commit;
+
 	/* Verify that the conflict has been resolved */
 	if (file_exists(git_path_cherry_pick_head()) ||
 	    file_exists(git_path_revert_head())) {
@@ -1333,6 +1359,7 @@ int sequencer_continue(struct replay_opts *opts)
 		res = error_dirty_index(opts);
 		goto release_todo_list;
 	}
+skip_this_commit:
 	todo_list.current++;
 	res = pick_commits(&todo_list, opts);
 release_todo_list:
diff --git a/sequencer.h b/sequencer.h
index 7a513c576b..c747cfcfc7 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -23,6 +23,7 @@ struct replay_opts {
 	int allow_empty;
 	int allow_empty_message;
 	int keep_redundant_commits;
+	int skip_empty;
 
 	int mainline;
 
-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply related

* [PATCH 5/5] sequencer: allow to --skip current commit
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta
In-Reply-To: <20170123225221.3659-1-giuseppe.bilotta@gmail.com>

If a sequencing gets interrupted (by a conflict or an empty commit or
whatever), the user can now opt to just skip it passing the `--skip`
command line option, which acts like a `--continue`, except that the
current commit gets skipped.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 Documentation/sequencer.txt | 10 +++++++++-
 builtin/revert.c            |  7 +++++--
 sequencer.c                 | 32 ++++++++++++++++++++++++++++----
 sequencer.h                 |  2 +-
 4 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/Documentation/sequencer.txt b/Documentation/sequencer.txt
index 5747f442f2..095d6cd732 100644
--- a/Documentation/sequencer.txt
+++ b/Documentation/sequencer.txt
@@ -1,7 +1,15 @@
 --continue::
 	Continue the operation in progress using the information in
 	'.git/sequencer'.  Can be used to continue after resolving
-	conflicts in a failed cherry-pick or revert.
+	conflicts in a failed cherry-pick or revert.  Use `--skip`
+	instead if the current commit should be ignored.
+
+--skip::
+	Skips the current commit, and then continues the operation
+	in progress using the information in '.git/sequencer'.i
+	Can be used to continue to a cherry-pick or rever that was
+	interrupted by an empty commit, or by a commit that conflicts
+	and for which the resolution is to discard the commit.
 
 --quit::
 	Forget about the current operation in progress.  Can be used
diff --git a/builtin/revert.c b/builtin/revert.c
index aca8a1d9d0..dece0bebf7 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -79,6 +79,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 	struct option base_options[] = {
 		OPT_CMDMODE(0, "quit", &cmd, N_("end revert or cherry-pick sequence"), 'q'),
 		OPT_CMDMODE(0, "continue", &cmd, N_("resume revert or cherry-pick sequence"), 'c'),
+		OPT_CMDMODE(0, "skip", &cmd, N_("resume revert or cherry-pick sequence, skipping this commit"), 's'),
 		OPT_CMDMODE(0, "abort", &cmd, N_("cancel revert or cherry-pick sequence"), 'a'),
 		OPT_BOOL('n', "no-commit", &opts->no_commit, N_("don't automatically commit")),
 		OPT_BOOL('e', "edit", &opts->edit, N_("edit the commit message")),
@@ -127,6 +128,8 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 			this_operation = "--quit";
 		else if (cmd == 'c')
 			this_operation = "--continue";
+		else if (cmd == 's')
+			this_operation = "--skip";
 		else {
 			assert(cmd == 'a');
 			this_operation = "--abort";
@@ -188,8 +191,8 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 
 	if (cmd == 'q')
 		return sequencer_remove_state(opts);
-	if (cmd == 'c')
-		return sequencer_continue(opts);
+	if (cmd == 'c' || cmd == 's')
+		return sequencer_continue(opts, cmd);
 	if (cmd == 'a')
 		return sequencer_rollback(opts);
 	return sequencer_pick_revisions(opts);
diff --git a/sequencer.c b/sequencer.c
index 333d9112de..cfe8c06989 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1359,21 +1359,45 @@ static int continue_single_pick(void)
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
 
-int sequencer_continue(struct replay_opts *opts)
+/*
+ * Continue the sequencing, after either committing
+ * (cmd == 'c') or skipping (cmd == 's') the current
+ * commit.
+ */
+int sequencer_continue(struct replay_opts *opts, char cmd)
 {
 	struct todo_list todo_list = TODO_LIST_INIT;
-	int res;
+	int single, res;
 
 	if (read_and_refresh_cache(opts))
 		return -1;
 
-	if (!file_exists(get_todo_path(opts)))
-		return continue_single_pick();
+	if (!file_exists(get_todo_path(opts))) {
+		if (cmd == 'c') {
+			return continue_single_pick();
+		} else {
+			assert(cmd == 's');
+			/* Skipping the only commit is equivalent to an abort */
+			return sequencer_rollback(opts);
+		}
+	}
 	if (read_populate_opts(opts))
 		return -1;
 	if ((res = read_populate_todo(&todo_list, opts)))
 		goto release_todo_list;
 
+	/* If we were asked to skip this commit, rollback
+	 * and continue with the next */
+	if (cmd == 's') {
+		if ((res = rollback_single_pick()))
+			goto release_todo_list;
+		discard_cache();
+		if ((res = read_cache()) < 0)
+			goto release_todo_list;
+		printf("index unchanged: %d\n", is_index_unchanged());
+		goto skip_this_commit;
+	}
+
 	/* check if there is something to commit */
 	res = is_index_unchanged();
 	if (res < 0)
diff --git a/sequencer.h b/sequencer.h
index f8b8bd0063..afc4bb4e6c 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -41,7 +41,7 @@ struct replay_opts {
 #define REPLAY_OPTS_INIT { -1 }
 
 int sequencer_pick_revisions(struct replay_opts *opts);
-int sequencer_continue(struct replay_opts *opts);
+int sequencer_continue(struct replay_opts *opts, char cmd);
 int sequencer_rollback(struct replay_opts *opts);
 int sequencer_remove_state(struct replay_opts *opts);
 
-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply related

* [PATCH 0/5] sequencer: allow skipping commits
From: Giuseppe Bilotta @ 2017-01-23 22:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta

This series introduces a few options to the sequencer,
to allow skipping unwanted/unnecessary commits.

The first patch is just cleanup. The second fixes a potential issue
about sequencing options not being correctly remembered across
interruptions.

The next two introduce cherry-pick options to skip empty (or only
redundant) commits. The two options are introduced separately because
of the complexity associated with the possible combinations that can be
had.

The last commit allows --skip as a reset + --continue, to quickly skip
the current commit during a failed cherry-pick or revert (for example
because a better version of the commit was already merged).

Giuseppe Bilotta (5):
  sequencer: sort options load/save by struct position
  sequencer: save/load all options
  cherry-pick: option to skip empty commits
  cherry-pick: allow skipping only redundant commits
  sequencer: allow to --skip current commit

 Documentation/git-cherry-pick.txt |  10 +++
 Documentation/sequencer.txt       |  10 ++-
 builtin/revert.c                  |  24 +++++-
 sequencer.c                       | 163 ++++++++++++++++++++++++++++++--------
 sequencer.h                       |   4 +-
 5 files changed, 176 insertions(+), 35 deletions(-)

-- 
2.11.0.616.gd72966cf44.dirty


^ permalink raw reply

* Re: [PATCH] rebase: pass --signoff option to git am
From: Giuseppe Bilotta @ 2017-01-23 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List
In-Reply-To: <xmqqbmuxr0pd.fsf@gitster.mtv.corp.google.com>

On Mon, Jan 23, 2017 at 9:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:
>
>> On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> Should we plan to extend this to the interactive backend that is
>>> shared between rebase -i and rebase -m, too?  Or is this patch
>>> already sufficient to cover them?
>>
>> AFAIK this is sufficient for both, in the sense that I've used it with
>> git rebase -i and it works.
>
> That is a good news and at the same time a bit awkard one ;-)
>
> The mention of "passed to 'git am'" twice in the documentation and
> help text would lead people to think "rebase -i" would not be
> affected and (1) would need more work to do so, or (2) the user does
> not want "rebase -i" to be unaffected for whatever reason, and gets
> surprised to see that it actually does get affected.

I'm not sure I follow. If the user doesn't want to signoff during a
rebase, they can simply not pass --signoff. If they do, they can not
pass it. Am I missing something?

> In any case, will queue as-is so that we won't lose the patch while
> waiting for people to raise their opinions.

Thanks.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [RFC PATCH] Option to allow cherry-pick to skip empty commits
From: Giuseppe Bilotta @ 2017-01-23 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List
In-Reply-To: <xmqqfuk9r0zw.fsf@gitster.mtv.corp.google.com>

On Mon, Jan 23, 2017 at 9:10 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:
>
>> ... I still don't see how to force a complete reread of the index
>> after running a git reset (which I need for the --skip command), ...
>
> Do you mean discard_index() or discard_cache() followed by
> read_index() or read_cache(), or do you mean something more
> elaborate?

Apparently discard_cache() followed by read_cache() is exactly what I
needed. New patchset incoming 8-)




-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Brandon Williams @ 2017-01-23 22:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, pclouds
In-Reply-To: <xmqq37g9qwr2.fsf@gitster.mtv.corp.google.com>

On 01/23, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > The last big hurdle towards a thread-safe API for the attribute system
> > is the reliance on a global attribute stack that is modified during each
> > call into the attribute system.
> 
> The same comment as 22/27 applies here.  
> 
> It is not an immediate problem we need to solve in the scope of this
> series, in the sense that a Big Subsystem Lock for the attribute
> subsystem around git_check_attr() function can make it thread-safe.
> 
> But if we want to make it truly threadable without a Big Subsystem
> Lock, this and the other one would need to become per-thread at
> least.  I think the check_all_attrs scoreboard, which is the topic
> of 22/27, should become per git_check_attr() invocation (immediately
> before making a call to collect_some_attrs(), prepare an array with
> map.size elements and use that as a scoreboard, for example).  I do
> not think we can be sure that the "slimmed down attr stack" 15/27
> envisions would help performance without benchmarking, but if it
> does, then the "attr stack that holds entries that are relevant to
> the current query" would have to become per <thread, check> pair, as
> two threads may be executing the same codepath looking for the same
> set of attributes (i.e. sharing a single attr_check instance), but
> working on two different parts of a tree structure.
> 
> > This patch removes this global stack and instead a stack is stored
> > locally in each attr_check instance.  This opens up the opportunity for
> > future optimizations to customize the attribute stack for the attributes
> > that a particular attr_check struct is interested in.
> 
> This is still true.  But two threads hitting the same attr_check
> would make the stack thrash between the paths they are working on to
> hurt performance once we go multi-threaded.
> 
> Perhaps, provided if the "slimmed down attr stack" is indeed a good
> idea, we should keep the global hashmap that holds everything we
> read from .gitattributes tree-wide (i.e. as in your v1), _and_
> introduce a mechanism to keep the slimmed down version that is
> relevant to check[] for each thread somehow.

Sounds good,  I'll reintroduce the hashmap of stacks that I had in v1
and instead make the all_attrs array that is used the in collection
process allocated at invocation time.  That will cause a bit of
allocation churn but in reality shouldn't make that much of an impact.

As we discussed off-line I'll also do the rework to break up the
question and result.  That way two threads can be executing using the
same attr_check structure.

-- 
Brandon Williams

^ permalink raw reply

* Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Junio C Hamano @ 2017-01-23 21:42 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, pclouds
In-Reply-To: <20170123203525.185058-26-bmwill@google.com>

Brandon Williams <bmwill@google.com> writes:

> The last big hurdle towards a thread-safe API for the attribute system
> is the reliance on a global attribute stack that is modified during each
> call into the attribute system.

The same comment as 22/27 applies here.  

It is not an immediate problem we need to solve in the scope of this
series, in the sense that a Big Subsystem Lock for the attribute
subsystem around git_check_attr() function can make it thread-safe.

But if we want to make it truly threadable without a Big Subsystem
Lock, this and the other one would need to become per-thread at
least.  I think the check_all_attrs scoreboard, which is the topic
of 22/27, should become per git_check_attr() invocation (immediately
before making a call to collect_some_attrs(), prepare an array with
map.size elements and use that as a scoreboard, for example).  I do
not think we can be sure that the "slimmed down attr stack" 15/27
envisions would help performance without benchmarking, but if it
does, then the "attr stack that holds entries that are relevant to
the current query" would have to become per <thread, check> pair, as
two threads may be executing the same codepath looking for the same
set of attributes (i.e. sharing a single attr_check instance), but
working on two different parts of a tree structure.

> This patch removes this global stack and instead a stack is stored
> locally in each attr_check instance.  This opens up the opportunity for
> future optimizations to customize the attribute stack for the attributes
> that a particular attr_check struct is interested in.

This is still true.  But two threads hitting the same attr_check
would make the stack thrash between the paths they are working on to
hurt performance once we go multi-threaded.

Perhaps, provided if the "slimmed down attr stack" is indeed a good
idea, we should keep the global hashmap that holds everything we
read from .gitattributes tree-wide (i.e. as in your v1), _and_
introduce a mechanism to keep the slimmed down version that is
relevant to check[] for each thread somehow.

^ permalink raw reply

* Re: [PATCH v2 22/27] attr: eliminate global check_all_attr array
From: Junio C Hamano @ 2017-01-23 21:11 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, pclouds
In-Reply-To: <20170123203525.185058-23-bmwill@google.com>

Brandon Williams <bmwill@google.com> writes:

> Currently there is a reliance on 'check_all_attr' which is a global
> array of 'attr_check_item' items which is used to store the value of
> each attribute during the collection process.
>
> This patch eliminates this global and instead creates an array per
> 'attr_check' instance which is then used in the attribute collection
> process.  This brings the attribute system one step closer to being
> thread-safe.

Hmph, how close is "closer"?  

My understanding of this is that a codepath that has a single
"attr_check" can be executing simultaneously by multiple threads,
and "attr_check" is meant to contain read-only stuff sharable by
them.  Unless this check_all_attr is tied to the attr_result (which
in turn is tied to each invocation and typically is on stack), the
resulting code would not be safe, right?


^ permalink raw reply

* [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, gitster, pclouds
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

The last big hurdle towards a thread-safe API for the attribute system
is the reliance on a global attribute stack that is modified during each
call into the attribute system.

This patch removes this global stack and instead a stack is stored
locally in each attr_check instance.  This opens up the opportunity for
future optimizations to customize the attribute stack for the attributes
that a particular attr_check struct is interested in.

One caveat with pushing the attribute stack into the attr_check
structure is that the attribute system now needs to keep track of all
active attr_check instances.  Due to the direction mechanism the stack
needs to be dropped when the direction is switched.  In order to ensure
correctness when the direction is changed the attribute system needs to
iterate through all active attr_check instances and drop each of their
stacks.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 277 ++++++++++++++++++++++++++++++++++++++++++++---------------------
 attr.h |   3 +
 2 files changed, 193 insertions(+), 87 deletions(-)

diff --git a/attr.c b/attr.c
index 95456503e..d64d1959e 100644
--- a/attr.c
+++ b/attr.c
@@ -434,17 +434,16 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
  * .gitignore file and info/excludes file as a fallback.
  */
 
-/* NEEDSWORK: This will become per git_attr_check */
-static struct attr_stack {
+struct attr_stack {
 	struct attr_stack *prev;
 	char *origin;
 	size_t originlen;
 	unsigned num_matches;
 	unsigned alloc;
 	struct match_attr **attrs;
-} *attr_stack;
+};
 
-static void free_attr_elem(struct attr_stack *e)
+static void attr_stack_free(struct attr_stack *e)
 {
 	int i;
 	free(e->origin);
@@ -467,6 +466,85 @@ static void free_attr_elem(struct attr_stack *e)
 	free(e);
 }
 
+/* List of all attr_check structs; access should be surrounded by mutex */
+static struct check_vector {
+	size_t nr;
+	size_t alloc;
+	struct attr_check **checks;
+#ifndef NO_PTHREADS
+	pthread_mutex_t mutex;
+#endif
+} check_vector;
+
+static inline void vector_lock(void)
+{
+#ifndef NO_PTHREADS
+	pthread_mutex_lock(&check_vector.mutex);
+#endif
+}
+
+static inline void vector_unlock(void)
+{
+#ifndef NO_PTHREADS
+	pthread_mutex_unlock(&check_vector.mutex);
+#endif
+}
+
+static void check_vector_add(struct attr_check *c)
+{
+	vector_lock();
+
+	ALLOC_GROW(check_vector.checks,
+		   check_vector.nr + 1,
+		   check_vector.alloc);
+	check_vector.checks[check_vector.nr++] = c;
+
+	vector_unlock();
+}
+
+static void check_vector_remove(struct attr_check *check)
+{
+	int i;
+
+	vector_lock();
+
+	/* Find entry */
+	for (i = 0; i < check_vector.nr; i++)
+		if (check_vector.checks[i] == check)
+			break;
+
+	if (i >= check_vector.nr)
+		die("BUG: no entry found");
+
+	/* shift entries over */
+	for (; i < check_vector.nr - 1; i++)
+		check_vector.checks[i] = check_vector.checks[i + 1];
+
+	check_vector.nr--;
+
+	vector_unlock();
+}
+
+/* Iterate through all attr_check instances and drop their stacks */
+static void drop_attr_stack(void)
+{
+	int i;
+
+	vector_lock();
+
+	for (i = 0; i < check_vector.nr; i++) {
+		struct attr_stack **stack = &check_vector.checks[i]->stack;
+
+		while (*stack) {
+			struct attr_stack *elem = *stack;
+			*stack = elem->prev;
+			attr_stack_free(elem);
+		}
+	}
+
+	vector_unlock();
+}
+
 static const char *builtin_attr[] = {
 	"[attr]binary -diff -merge -text",
 	NULL,
@@ -621,15 +699,6 @@ static void debug_set(const char *what, const char *match, struct git_attr *attr
 #define debug_set(a,b,c,d) do { ; } while (0)
 #endif /* DEBUG_ATTR */
 
-static void drop_attr_stack(void)
-{
-	while (attr_stack) {
-		struct attr_stack *elem = attr_stack;
-		attr_stack = elem->prev;
-		free_attr_elem(elem);
-	}
-}
-
 static const char *git_etc_gitattributes(void)
 {
 	static const char *system_wide;
@@ -638,6 +707,14 @@ static const char *git_etc_gitattributes(void)
 	return system_wide;
 }
 
+static const char *get_home_gitattributes(void)
+{
+	if (!git_attributes_file)
+		git_attributes_file = xdg_config_home("attributes");
+
+	return git_attributes_file;
+}
+
 static int git_attr_system(void)
 {
 	return !git_env_bool("GIT_ATTR_NOSYSTEM", 0);
@@ -657,47 +734,50 @@ static void push_stack(struct attr_stack **attr_stack_p,
 	}
 }
 
-static void bootstrap_attr_stack(void)
+static void bootstrap_attr_stack(struct attr_stack **stack)
 {
-	struct attr_stack *elem;
+	struct attr_stack *e;
 
-	if (attr_stack)
+	if (*stack)
 		return;
 
-	push_stack(&attr_stack, read_attr_from_array(builtin_attr), NULL, 0);
-
-	if (git_attr_system())
-		push_stack(&attr_stack,
-			   read_attr_from_file(git_etc_gitattributes(), 1),
-			   NULL, 0);
+	/* builtin frame */
+	e = read_attr_from_array(builtin_attr);
+	push_stack(stack, e, NULL, 0);
 
-	if (!git_attributes_file)
-		git_attributes_file = xdg_config_home("attributes");
-	if (git_attributes_file)
-		push_stack(&attr_stack,
-			   read_attr_from_file(git_attributes_file, 1),
-			   NULL, 0);
+	/* system-wide frame */
+	if (git_attr_system()) {
+		e = read_attr_from_file(git_etc_gitattributes(), 1);
+		push_stack(stack, e, NULL, 0);
+	}
 
-	if (!is_bare_repository() || direction == GIT_ATTR_INDEX) {
-		elem = read_attr(GITATTRIBUTES_FILE, 1);
-		push_stack(&attr_stack, elem, xstrdup(""), 0);
-		debug_push(elem);
+	/* home directory */
+	if (get_home_gitattributes()) {
+		e = read_attr_from_file(get_home_gitattributes(), 1);
+		push_stack(stack, e, NULL, 0);
 	}
 
-	if (startup_info->have_repository)
-		elem = read_attr_from_file(git_path_info_attributes(), 1);
+	/* root directory */
+	if (!is_bare_repository() || direction == GIT_ATTR_INDEX)
+		e = read_attr(GITATTRIBUTES_FILE, 1);
 	else
-		elem = NULL;
+		e = xcalloc(1, sizeof(struct attr_stack));
+	push_stack(stack, e, xstrdup(""), 0);
 
-	if (!elem)
-		elem = xcalloc(1, sizeof(*elem));
-	push_stack(&attr_stack, elem, NULL, 0);
+	/* info frame */
+	if (startup_info->have_repository)
+		e = read_attr_from_file(git_path_info_attributes(), 1);
+	else
+		e = NULL;
+	if (!e)
+		e = xcalloc(1, sizeof(struct attr_stack));
+	push_stack(stack, e, NULL, 0);
 }
 
-static void prepare_attr_stack(const char *path, int dirlen)
+static void prepare_attr_stack(const char *path, int dirlen,
+			       struct attr_stack **stack)
 {
-	struct attr_stack *elem, *info;
-	const char *cp;
+	struct attr_stack *info;
 
 	/*
 	 * At the bottom of the attribute stack is the built-in
@@ -714,13 +794,13 @@ static void prepare_attr_stack(const char *path, int dirlen)
 	 * .gitattributes in deeper directories to shallower ones,
 	 * and finally use the built-in set as the default.
 	 */
-	bootstrap_attr_stack();
+	bootstrap_attr_stack(stack);
 
 	/*
 	 * Pop the "info" one that is always at the top of the stack.
 	 */
-	info = attr_stack;
-	attr_stack = info->prev;
+	info = *stack;
+	*stack = info->prev;
 
 	/*
 	 * Pop the ones from directories that are not the prefix of
@@ -728,18 +808,19 @@ static void prepare_attr_stack(const char *path, int dirlen)
 	 * the root one (whose origin is an empty string "") or the builtin
 	 * one (whose origin is NULL) without popping it.
 	 */
-	while (attr_stack->origin) {
-		int namelen = strlen(attr_stack->origin);
+	while ((*stack)->origin) {
+		int namelen = (*stack)->originlen;
+		struct attr_stack *elem;
 
-		elem = attr_stack;
+		elem = *stack;
 		if (namelen <= dirlen &&
 		    !strncmp(elem->origin, path, namelen) &&
 		    (!namelen || path[namelen] == '/'))
 			break;
 
 		debug_pop(elem);
-		attr_stack = elem->prev;
-		free_attr_elem(elem);
+		*stack = elem->prev;
+		attr_stack_free(elem);
 	}
 
 	/*
@@ -754,33 +835,43 @@ static void prepare_attr_stack(const char *path, int dirlen)
 		 */
 		struct strbuf pathbuf = STRBUF_INIT;
 
-		assert(attr_stack->origin);
-		while (1) {
-			size_t len = strlen(attr_stack->origin);
+		assert((*stack)->origin);
+		strbuf_addstr(&pathbuf, (*stack)->origin);
+		/* Build up to the directory 'path' is in */
+		while (pathbuf.len < dirlen) {
+			size_t len = pathbuf.len;
+			struct attr_stack *next;
 			char *origin;
 
-			if (dirlen <= len)
-				break;
-			cp = memchr(path + len + 1, '/', dirlen - len - 1);
-			if (!cp)
-				cp = path + dirlen;
-			strbuf_addf(&pathbuf,
-				    "%.*s/%s", (int)(cp - path), path,
-				    GITATTRIBUTES_FILE);
-			elem = read_attr(pathbuf.buf, 0);
-			strbuf_setlen(&pathbuf, cp - path);
-			origin = strbuf_detach(&pathbuf, &len);
-			push_stack(&attr_stack, elem, origin, len);
-			debug_push(elem);
-		}
+			/* Skip path-separator */
+			if (len < dirlen && is_dir_sep(path[len]))
+				len++;
+			/* Find the end of the next component */
+			while (len < dirlen && !is_dir_sep(path[len]))
+				len++;
+
+			if (pathbuf.len > 0)
+				strbuf_addch(&pathbuf, '/');
+			strbuf_add(&pathbuf, path + pathbuf.len,
+				   (len - pathbuf.len));
+			strbuf_addf(&pathbuf, "/%s", GITATTRIBUTES_FILE);
 
+			next = read_attr(pathbuf.buf, 0);
+
+			/* reset the pathbuf to not include "/.gitattributes" */
+			strbuf_setlen(&pathbuf, len);
+
+			origin = xstrdup(pathbuf.buf);
+			push_stack(stack, next, origin, len);
+
+		}
 		strbuf_release(&pathbuf);
 	}
 
 	/*
 	 * Finally push the "info" one at the top of the stack.
 	 */
-	push_stack(&attr_stack, info, NULL, 0);
+	push_stack(stack, info, NULL, 0);
 }
 
 static int path_matches(const char *pathname, int pathlen,
@@ -831,20 +922,23 @@ static int fill_one(const char *what, struct attr_check_item *all_attrs,
 }
 
 static int fill(const char *path, int pathlen, int basename_offset,
-		struct attr_stack *stk, struct attr_check_item *all_attrs,
-		int rem)
+		const struct attr_stack *stack,
+		struct attr_check_item *all_attrs, int rem)
 {
-	int i;
-	const char *base = stk->origin ? stk->origin : "";
-
-	for (i = stk->num_matches - 1; 0 < rem && 0 <= i; i--) {
-		const struct match_attr *a = stk->attrs[i];
-		if (a->is_macro)
-			continue;
-		if (path_matches(path, pathlen, basename_offset,
-				 &a->u.pat, base, stk->originlen))
-			rem = fill_one("fill", all_attrs, a, rem);
+	for (; rem > 0 && stack; stack = stack->prev) {
+		int i;
+		const char *base = stack->origin ? stack->origin : "";
+
+		for (i = stack->num_matches - 1; 0 < rem && 0 <= i; i--) {
+			const struct match_attr *a = stack->attrs[i];
+			if (a->is_macro)
+				continue;
+			if (path_matches(path, pathlen, basename_offset,
+					 &a->u.pat, base, stack->originlen))
+				rem = fill_one("fill", all_attrs, a, rem);
+		}
 	}
+
 	return rem;
 }
 
@@ -887,7 +981,6 @@ static void determine_macros(struct attr_check_item *all_attrs,
  */
 static void collect_some_attrs(const char *path, struct attr_check *check)
 {
-	struct attr_stack *stk;
 	int i, pathlen, rem, dirlen;
 	const char *cp, *last_slash = NULL;
 	int basename_offset;
@@ -905,9 +998,9 @@ static void collect_some_attrs(const char *path, struct attr_check *check)
 		dirlen = 0;
 	}
 
-	prepare_attr_stack(path, dirlen);
+	prepare_attr_stack(path, dirlen, &check->stack);
 	all_attrs_init(&g_attr_hashmap, check);
-	determine_macros(check->all_attrs, attr_stack);
+	determine_macros(check->all_attrs, check->stack);
 
 	if (check->check_nr) {
 		rem = 0;
@@ -924,8 +1017,7 @@ static void collect_some_attrs(const char *path, struct attr_check *check)
 	}
 
 	rem = check->all_attrs_nr;
-	for (stk = attr_stack; 0 < rem && stk; stk = stk->prev)
-		rem = fill(path, pathlen, basename_offset, stk, check->all_attrs, rem);
+	fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
 }
 
 int git_check_attr(const char *path, struct attr_check *check)
@@ -965,7 +1057,12 @@ void git_all_attrs(const char *path, struct attr_check *check)
 
 struct attr_check *attr_check_alloc(void)
 {
-	return xcalloc(1, sizeof(struct attr_check));
+	struct attr_check *c = xcalloc(1, sizeof(struct attr_check));
+
+	/* save pointer to the check struct */
+	check_vector_add(c);
+
+	return c;
 }
 
 struct attr_check *attr_check_initl(const char *one, ...)
@@ -1032,8 +1129,13 @@ void attr_check_clear(struct attr_check *check)
 
 void attr_check_free(struct attr_check *check)
 {
-	attr_check_clear(check);
-	free(check);
+	if (check) {
+		/* Remove check from the check vector */
+		check_vector_remove(check);
+
+		attr_check_clear(check);
+		free(check);
+	}
 }
 
 void git_attr_set_direction(enum git_attr_direction new, struct index_state *istate)
@@ -1053,5 +1155,6 @@ void attr_start(void)
 {
 #ifndef NO_PTHREADS
 	pthread_mutex_init(&g_attr_hashmap.mutex, NULL);
+	pthread_mutex_init(&check_vector.mutex, NULL);
 #endif
 }
diff --git a/attr.h b/attr.h
index 9b4dc07d8..da7c3a229 100644
--- a/attr.h
+++ b/attr.h
@@ -4,6 +4,8 @@
 /* An attribute is a pointer to this opaque structure */
 struct git_attr;
 
+struct attr_stack;
+
 /*
  * Given a string, return the gitattribute object that
  * corresponds to it.
@@ -41,6 +43,7 @@ struct attr_check {
 	struct attr_check_item *check;
 	int all_attrs_nr;
 	struct attr_check_item *all_attrs;
+	struct attr_stack *stack;
 };
 
 extern struct attr_check *attr_check_alloc(void);
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 27/27] attr: reformat git_attr_set_direction() function
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, gitster, pclouds
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

Move the 'git_attr_set_direction()' up to be closer to the variables
that it modifies as well as a small formatting by renaming the variable
'new' to 'new_direction' so that it is more descriptive.

Update the comment about how 'direction' is used to read the state of
the world.  It should be noted that callers of
'git_attr_set_direction()' should ensure that other threads are not
making calls into the attribute system until after the call to
'git_attr_set_direction()' completes.  This function essentially acts as
reset button for the attribute system and should be handled with care.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 49 ++++++++++++++++++++-----------------------------
 attr.h |  3 ++-
 2 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/attr.c b/attr.c
index c2ea5cb29..f35c1107f 100644
--- a/attr.c
+++ b/attr.c
@@ -578,26 +578,30 @@ static struct attr_stack *read_attr_from_array(const char **list)
 }
 
 /*
- * NEEDSWORK: these two are tricky.  The callers assume there is a
- * single, system-wide global state "where we read attributes from?"
- * and when the state is flipped by calling git_attr_set_direction(),
- * attr_stack is discarded so that subsequent attr_check will lazily
- * read from the right place.  And they do not know or care who called
- * by them uses the attribute subsystem, hence have no knowledge of
- * existing git_attr_check instances or future ones that will be
- * created).
- *
- * Probably we need a thread_local that holds these two variables,
- * and a list of git_attr_check instances (which need to be maintained
- * by hooking into git_attr_check_alloc(), git_attr_check_initl(), and
- * git_attr_check_clear().  Then git_attr_set_direction() updates the
- * fields in that thread_local for these two variables, iterate over
- * all the active git_attr_check instances and discard the attr_stack
- * they hold.  Yuck, but it sounds doable.
+ * Callers into the attribute system assume there is a single, system-wide
+ * global state where attributes are read from and when the state is flipped by
+ * calling git_attr_set_direction(), the stack frames that have been
+ * constructed need to be discarded so so that subsequent calls into the
+ * attribute system will lazily read from the right place.  Since changing
+ * direction causes a global paradigm shift, it should not ever be called while
+ * another thread could potentially be calling into the attribute system.
  */
 static enum git_attr_direction direction;
 static struct index_state *use_index;
 
+void git_attr_set_direction(enum git_attr_direction new_direction,
+			    struct index_state *istate)
+{
+	if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
+		die("BUG: non-INDEX attr direction in a bare repo");
+
+	if (new_direction != direction)
+		drop_attr_stack();
+
+	direction = new_direction;
+	use_index = istate;
+}
+
 static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 {
 	FILE *fp = fopen(path, "r");
@@ -1132,19 +1136,6 @@ void attr_check_free(struct attr_check *check)
 	}
 }
 
-void git_attr_set_direction(enum git_attr_direction new, struct index_state *istate)
-{
-	enum git_attr_direction old = direction;
-
-	if (is_bare_repository() && new != GIT_ATTR_INDEX)
-		die("BUG: non-INDEX attr direction in a bare repo");
-
-	direction = new;
-	if (new != old)
-		drop_attr_stack();
-	use_index = istate;
-}
-
 void attr_start(void)
 {
 #ifndef NO_PTHREADS
diff --git a/attr.h b/attr.h
index da7c3a229..62dbcb6b8 100644
--- a/attr.h
+++ b/attr.h
@@ -76,7 +76,8 @@ enum git_attr_direction {
 	GIT_ATTR_CHECKOUT,
 	GIT_ATTR_INDEX
 };
-void git_attr_set_direction(enum git_attr_direction, struct index_state *);
+void git_attr_set_direction(enum git_attr_direction new_direction,
+			    struct index_state *istate);
 
 extern void attr_start(void);
 
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 26/27] attr: push the bare repo check into read_attr()
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, gitster, pclouds
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

Push the bare repository check into the 'read_attr()' function.  This
avoids needing to have extra logic which creates an empty stack frame
when inside a bare repo as a similar bit of logic already exists in the
'read_attr()' function.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 114 +++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 54 insertions(+), 60 deletions(-)

diff --git a/attr.c b/attr.c
index d64d1959e..c2ea5cb29 100644
--- a/attr.c
+++ b/attr.c
@@ -648,25 +648,28 @@ static struct attr_stack *read_attr_from_index(const char *path, int macro_ok)
 
 static struct attr_stack *read_attr(const char *path, int macro_ok)
 {
-	struct attr_stack *res;
+	struct attr_stack *res = NULL;
 
-	if (direction == GIT_ATTR_CHECKOUT) {
+	if (direction == GIT_ATTR_INDEX) {
 		res = read_attr_from_index(path, macro_ok);
-		if (!res)
-			res = read_attr_from_file(path, macro_ok);
-	}
-	else if (direction == GIT_ATTR_CHECKIN) {
-		res = read_attr_from_file(path, macro_ok);
-		if (!res)
-			/*
-			 * There is no checked out .gitattributes file there, but
-			 * we might have it in the index.  We allow operation in a
-			 * sparsely checked out work tree, so read from it.
-			 */
+	} else if (!is_bare_repository()) {
+		if (direction == GIT_ATTR_CHECKOUT) {
 			res = read_attr_from_index(path, macro_ok);
+			if (!res)
+				res = read_attr_from_file(path, macro_ok);
+		} else if (direction == GIT_ATTR_CHECKIN) {
+			res = read_attr_from_file(path, macro_ok);
+			if (!res)
+				/*
+				 * There is no checked out .gitattributes file
+				 * there, but we might have it in the index.
+				 * We allow operation in a sparsely checked out
+				 * work tree, so read from it.
+				 */
+				res = read_attr_from_index(path, macro_ok);
+		}
 	}
-	else
-		res = read_attr_from_index(path, macro_ok);
+
 	if (!res)
 		res = xcalloc(1, sizeof(*res));
 	return res;
@@ -758,10 +761,7 @@ static void bootstrap_attr_stack(struct attr_stack **stack)
 	}
 
 	/* root directory */
-	if (!is_bare_repository() || direction == GIT_ATTR_INDEX)
-		e = read_attr(GITATTRIBUTES_FILE, 1);
-	else
-		e = xcalloc(1, sizeof(struct attr_stack));
+	e = read_attr(GITATTRIBUTES_FILE, 1);
 	push_stack(stack, e, xstrdup(""), 0);
 
 	/* info frame */
@@ -778,6 +778,7 @@ static void prepare_attr_stack(const char *path, int dirlen,
 			       struct attr_stack **stack)
 {
 	struct attr_stack *info;
+	struct strbuf pathbuf = STRBUF_INIT;
 
 	/*
 	 * At the bottom of the attribute stack is the built-in
@@ -824,54 +825,47 @@ static void prepare_attr_stack(const char *path, int dirlen,
 	}
 
 	/*
-	 * Read from parent directories and push them down
+	 * bootstrap_attr_stack() should have added, and the
+	 * above loop should have stopped before popping, the
+	 * root element whose attr_stack->origin is set to an
+	 * empty string.
 	 */
-	if (!is_bare_repository() || direction == GIT_ATTR_INDEX) {
-		/*
-		 * bootstrap_attr_stack() should have added, and the
-		 * above loop should have stopped before popping, the
-		 * root element whose attr_stack->origin is set to an
-		 * empty string.
-		 */
-		struct strbuf pathbuf = STRBUF_INIT;
-
-		assert((*stack)->origin);
-		strbuf_addstr(&pathbuf, (*stack)->origin);
-		/* Build up to the directory 'path' is in */
-		while (pathbuf.len < dirlen) {
-			size_t len = pathbuf.len;
-			struct attr_stack *next;
-			char *origin;
-
-			/* Skip path-separator */
-			if (len < dirlen && is_dir_sep(path[len]))
-				len++;
-			/* Find the end of the next component */
-			while (len < dirlen && !is_dir_sep(path[len]))
-				len++;
-
-			if (pathbuf.len > 0)
-				strbuf_addch(&pathbuf, '/');
-			strbuf_add(&pathbuf, path + pathbuf.len,
-				   (len - pathbuf.len));
-			strbuf_addf(&pathbuf, "/%s", GITATTRIBUTES_FILE);
-
-			next = read_attr(pathbuf.buf, 0);
-
-			/* reset the pathbuf to not include "/.gitattributes" */
-			strbuf_setlen(&pathbuf, len);
-
-			origin = xstrdup(pathbuf.buf);
-			push_stack(stack, next, origin, len);
-
-		}
-		strbuf_release(&pathbuf);
+	assert((*stack)->origin);
+
+	strbuf_addstr(&pathbuf, (*stack)->origin);
+	/* Build up to the directory 'path' is in */
+	while (pathbuf.len < dirlen) {
+		size_t len = pathbuf.len;
+		struct attr_stack *next;
+		char *origin;
+
+		/* Skip path-separator */
+		if (len < dirlen && is_dir_sep(path[len]))
+			len++;
+		/* Find the end of the next component */
+		while (len < dirlen && !is_dir_sep(path[len]))
+			len++;
+
+		if (pathbuf.len > 0)
+			strbuf_addch(&pathbuf, '/');
+		strbuf_add(&pathbuf, path + pathbuf.len, (len - pathbuf.len));
+		strbuf_addf(&pathbuf, "/%s", GITATTRIBUTES_FILE);
+
+		next = read_attr(pathbuf.buf, 0);
+
+		/* reset the pathbuf to not include "/.gitattributes" */
+		strbuf_setlen(&pathbuf, len);
+
+		origin = xstrdup(pathbuf.buf);
+		push_stack(stack, next, origin, len);
 	}
 
 	/*
 	 * Finally push the "info" one at the top of the stack.
 	 */
 	push_stack(stack, info, NULL, 0);
+
+	strbuf_release(&pathbuf);
 }
 
 static int path_matches(const char *pathname, int pathlen,
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 19/27] attr: pass struct attr_check to collect_some_attrs
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, gitster, pclouds
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

The old callchain used to take an array of attr_check_item items.
Instead pass the 'attr_check' container object to 'collect_some_attrs()'
and access the fields in the data structure directly.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/attr.c b/attr.c
index da727e3fd..e58fa340c 100644
--- a/attr.c
+++ b/attr.c
@@ -777,9 +777,7 @@ static int macroexpand_one(int nr, int rem)
  * check_all_attr. If num is non-zero, only attributes in check[] are
  * collected. Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const char *path, int num,
-			       struct attr_check_item *check)
-
+static void collect_some_attrs(const char *path, struct attr_check *check)
 {
 	struct attr_stack *stk;
 	int i, pathlen, rem, dirlen;
@@ -802,17 +800,18 @@ static void collect_some_attrs(const char *path, int num,
 	prepare_attr_stack(path, dirlen);
 	for (i = 0; i < attr_nr; i++)
 		check_all_attr[i].value = ATTR__UNKNOWN;
-	if (num && !cannot_trust_maybe_real) {
+	if (check->check_nr && !cannot_trust_maybe_real) {
 		rem = 0;
-		for (i = 0; i < num; i++) {
-			if (!check[i].attr->maybe_real) {
+		for (i = 0; i < check->check_nr; i++) {
+			const struct git_attr *a = check->check[i].attr;
+			if (!a->maybe_real) {
 				struct attr_check_item *c;
-				c = check_all_attr + check[i].attr->attr_nr;
+				c = check_all_attr + a->attr_nr;
 				c->value = ATTR__UNSET;
 				rem++;
 			}
 		}
-		if (rem == num)
+		if (rem == check->check_nr)
 			return;
 	}
 
@@ -821,18 +820,17 @@ static void collect_some_attrs(const char *path, int num,
 		rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-static int git_check_attrs(const char *path, int num,
-			   struct attr_check_item *check)
+int git_check_attr(const char *path, struct attr_check *check)
 {
 	int i;
 
-	collect_some_attrs(path, num, check);
+	collect_some_attrs(path, check);
 
-	for (i = 0; i < num; i++) {
-		const char *value = check_all_attr[check[i].attr->attr_nr].value;
+	for (i = 0; i < check->check_nr; i++) {
+		const char *value = check_all_attr[check->check[i].attr->attr_nr].value;
 		if (value == ATTR__UNKNOWN)
 			value = ATTR__UNSET;
-		check[i].value = value;
+		check->check[i].value = value;
 	}
 
 	return 0;
@@ -843,7 +841,7 @@ void git_all_attrs(const char *path, struct attr_check *check)
 	int i;
 
 	attr_check_reset(check);
-	collect_some_attrs(path, check->check_nr, check->check);
+	collect_some_attrs(path, check);
 
 	for (i = 0; i < attr_nr; i++) {
 		const char *name = check_all_attr[i].attr->name;
@@ -856,11 +854,6 @@ void git_all_attrs(const char *path, struct attr_check *check)
 	}
 }
 
-int git_check_attr(const char *path, struct attr_check *check)
-{
-	return git_check_attrs(path, check->check_nr, check->check);
-}
-
 struct attr_check *attr_check_alloc(void)
 {
 	return xcalloc(1, sizeof(struct attr_check));
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 08/27] attr.c: tighten constness around "git_attr" structure
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, sbeller, pclouds, Brandon Williams
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

From: Junio C Hamano <gitster@pobox.com>

It holds an interned string, and git_attr_name() is a way to peek
into it.  Make sure the involved pointer types are pointer-to-const.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 2 +-
 attr.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/attr.c b/attr.c
index e42f931b3..f7cf7ae30 100644
--- a/attr.c
+++ b/attr.c
@@ -43,7 +43,7 @@ static int cannot_trust_maybe_real;
 static struct git_attr_check *check_all_attr;
 static struct git_attr *(git_attr_hash[HASHSIZE]);
 
-char *git_attr_name(struct git_attr *attr)
+const char *git_attr_name(const struct git_attr *attr)
 {
 	return attr->name;
 }
diff --git a/attr.h b/attr.h
index 8b08d33af..00d7a662c 100644
--- a/attr.h
+++ b/attr.h
@@ -25,7 +25,7 @@ extern const char git_attr__false[];
  * Unset one is returned as NULL.
  */
 struct git_attr_check {
-	struct git_attr *attr;
+	const struct git_attr *attr;
 	const char *value;
 };
 
@@ -34,7 +34,7 @@ struct git_attr_check {
  * return value is a pointer to a null-delimited string that is part
  * of the internal data structure; it should not be modified or freed.
  */
-char *git_attr_name(struct git_attr *);
+extern const char *git_attr_name(const struct git_attr *);
 
 int git_check_attr(const char *path, int, struct git_attr_check *);
 
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 06/27] attr.c: mark where #if DEBUG ends more clearly
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, sbeller, pclouds, Brandon Williams
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

From: Junio C Hamano <gitster@pobox.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 9bdf87a6f..17297fffe 100644
--- a/attr.c
+++ b/attr.c
@@ -469,7 +469,7 @@ static void debug_set(const char *what, const char *match, struct git_attr *attr
 #define debug_push(a) do { ; } while (0)
 #define debug_pop(a) do { ; } while (0)
 #define debug_set(a,b,c,d) do { ; } while (0)
-#endif
+#endif /* DEBUG_ATTR */
 
 static void drop_attr_stack(void)
 {
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 15/27] attr: (re)introduce git_check_attr() and struct attr_check
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, sbeller, pclouds, Brandon Williams
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

From: Junio C Hamano <gitster@pobox.com>

A common pattern to check N attributes for many paths is to

 (1) prepare an array A of N attr_check_item items;
 (2) call git_attr() to intern the N attribute names and fill A;
 (3) repeatedly call git_check_attrs() for path with N and A;

A look-up for these N attributes for a single path P scans the
entire attr_stack, starting from the .git/info/attributes file and
then .gitattributes file in the directory the path P is in, going
upwards to find .gitattributes file found in parent directories.

An earlier commit 06a604e6 (attr: avoid heavy work when we know the
specified attr is not defined, 2014-12-28) tried to optimize out
this scanning for one trivial special case: when the attribute being
sought is known not to exist, we do not have to scan for it.  While
this may be a cheap and effective heuristic, it would not work well
when N is (much) more than 1.

What we would want is a more customized way to skip irrelevant
entries in the attribute stack, and the definition of irrelevance
is tied to the set of attributes passed to git_check_attrs() call,
i.e. the set of attributes being sought.  The data necessary for
this optimization needs to live alongside the set of attributes, but
a simple array of git_attr_check_elem simply does not have any place
for that.

Introduce "struct attr_check" that contains N, the number of
attributes being sought, and A, the array that holds N
attr_check_item items, and a function git_check_attr() that
takes a path P and this structure as its parameters.  This structure
can later be extended to hold extra data necessary for optimization.

Also, to make it easier to write the first two steps in common
cases, introduce git_attr_check_initl() helper function, which takes
a NULL-terminated list of attribute names and initialize this
structure.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 attr.h | 17 +++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/attr.c b/attr.c
index 2f180d609..be9e398e9 100644
--- a/attr.c
+++ b/attr.c
@@ -865,6 +865,80 @@ int git_all_attrs(const char *path, int *num, struct attr_check_item **check)
 	return 0;
 }
 
+struct attr_check *attr_check_alloc(void)
+{
+	return xcalloc(1, sizeof(struct attr_check));
+}
+
+int git_check_attr(const char *path, struct attr_check *check)
+{
+	return git_check_attrs(path, check->check_nr, check->check);
+}
+
+struct attr_check *attr_check_initl(const char *one, ...)
+{
+	struct attr_check *check;
+	int cnt;
+	va_list params;
+	const char *param;
+
+	va_start(params, one);
+	for (cnt = 1; (param = va_arg(params, const char *)) != NULL; cnt++)
+		;
+	va_end(params);
+
+	check = attr_check_alloc();
+	check->check_nr = cnt;
+	check->check_alloc = cnt;
+	check->check = xcalloc(cnt, sizeof(struct attr_check_item));
+
+	check->check[0].attr = git_attr(one);
+	va_start(params, one);
+	for (cnt = 1; cnt < check->check_nr; cnt++) {
+		struct git_attr *attr;
+		param = va_arg(params, const char *);
+		if (!param)
+			die("BUG: counted %d != ended at %d",
+			    check->check_nr, cnt);
+		attr = git_attr(param);
+		if (!attr)
+			die("BUG: %s: not a valid attribute name", param);
+		check->check[cnt].attr = attr;
+	}
+	va_end(params);
+	return check;
+}
+
+struct attr_check_item *attr_check_append(struct attr_check *check,
+					  const struct git_attr *attr)
+{
+	struct attr_check_item *item;
+
+	ALLOC_GROW(check->check, check->check_nr + 1, check->check_alloc);
+	item = &check->check[check->check_nr++];
+	item->attr = attr;
+	return item;
+}
+
+void attr_check_reset(struct attr_check *check)
+{
+	check->check_nr = 0;
+}
+
+void attr_check_clear(struct attr_check *check)
+{
+	free(check->check);
+	check->check = NULL;
+	check->check_alloc = 0;
+	check->check_nr = 0;
+}
+
+void attr_check_free(struct attr_check *check)
+{
+	attr_check_clear(check);
+	free(check);
+}
+
 void git_attr_set_direction(enum git_attr_direction new, struct index_state *istate)
 {
 	enum git_attr_direction old = direction;
diff --git a/attr.h b/attr.h
index efc7bb3b3..459347f4b 100644
--- a/attr.h
+++ b/attr.h
@@ -29,6 +29,22 @@ struct attr_check_item {
 	const char *value;
 };
 
+struct attr_check {
+	int check_nr;
+	int check_alloc;
+	struct attr_check_item *check;
+};
+
+extern struct attr_check *attr_check_alloc(void);
+extern struct attr_check *attr_check_initl(const char *, ...);
+
+extern struct attr_check_item *attr_check_append(struct attr_check *check,
+						 const struct git_attr *attr);
+
+extern void attr_check_reset(struct attr_check *check);
+extern void attr_check_clear(struct attr_check *check);
+extern void attr_check_free(struct attr_check *check);
+
 /*
  * Return the name of the attribute represented by the argument.  The
  * return value is a pointer to a null-delimited string that is part
@@ -37,6 +53,7 @@ struct attr_check_item {
 extern const char *git_attr_name(const struct git_attr *);
 
 int git_check_attrs(const char *path, int, struct attr_check_item *);
+extern int git_check_attr(const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.  *num
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 18/27] attr: retire git_check_attrs() API
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, sbeller, pclouds, Brandon Williams
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

From: Junio C Hamano <gitster@pobox.com>

Since nobody uses the old API, make it file-scope static, and update
the documentation to describe the new API.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/technical/api-gitattributes.txt | 86 +++++++++++++++++----------
 attr.c                                        |  3 +-
 attr.h                                        |  1 -
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
index 260266867..82f5130e7 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,10 +16,15 @@ Data Structure
 	of no interest to the calling programs.  The name of the
 	attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check`::
+`struct attr_check_item`::
 
-	This structure represents a set of attributes to check in a call
-	to `git_check_attr()` function, and receives the results.
+	This structure represents one attribute and its value.
+
+`struct attr_check`::
+
+	This structure represents a collection of `attr_check_item`.
+	It is passed to `git_check_attr()` function, specifying the
+	attributes to check, and receives their values.
 
 
 Attribute Values
@@ -27,7 +32,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records it.  There are three macros to check these:
+attr_check_item` records it.  There are three macros to check these:
 
 `ATTR_TRUE()`::
 
@@ -48,49 +53,51 @@ value of the attribute for the path.
 Querying Specific Attributes
 ----------------------------
 
-* Prepare an array of `struct git_attr_check` to define the list of
-  attributes you would want to check.  To populate this array, you would
-  need to define necessary attributes by calling `git_attr()` function.
+* Prepare `struct attr_check` using attr_check_initl()
+  function, enumerating the names of attributes whose values you are
+  interested in, terminated with a NULL pointer.  Alternatively, an
+  empty `struct attr_check` can be prepared by calling
+  `attr_check_alloc()` function and then attributes you want to
+  ask about can be added to it with `attr_check_append()`
+  function.
 
 * Call `git_check_attr()` to check the attributes for the path.
 
-* Inspect `git_attr_check` structure to see how each of the attribute in
-  the array is defined for the path.
+* Inspect `attr_check` structure to see how each of the
+  attribute in the array is defined for the path.
 
 
 Example
 -------
 
-To see how attributes "crlf" and "indent" are set for different paths.
+To see how attributes "crlf" and "ident" are set for different paths.
 
-. Prepare an array of `struct git_attr_check` with two elements (because
-  we are checking two attributes).  Initialize their `attr` member with
-  pointers to `struct git_attr` obtained by calling `git_attr()`:
+. Prepare a `struct attr_check` with two elements (because
+  we are checking two attributes):
 
 ------------
-static struct git_attr_check check[2];
+static struct attr_check *check;
 static void setup_check(void)
 {
-	if (check[0].attr)
+	if (check)
 		return; /* already done */
-	check[0].attr = git_attr("crlf");
-	check[1].attr = git_attr("ident");
+	check = attr_check_initl("crlf", "ident", NULL);
 }
 ------------
 
-. Call `git_check_attr()` with the prepared array of `struct git_attr_check`:
+. Call `git_check_attr()` with the prepared `struct attr_check`:
 
 ------------
 	const char *path;
 
 	setup_check();
-	git_check_attr(path, ARRAY_SIZE(check), check);
+	git_check_attr(path, check);
 ------------
 
-. Act on `.value` member of the result, left in `check[]`:
+. Act on `.value` member of the result, left in `check->check[]`:
 
 ------------
-	const char *value = check[0].value;
+	const char *value = check->check[0].value;
 
 	if (ATTR_TRUE(value)) {
 		The attribute is Set, by listing only the name of the
@@ -109,20 +116,39 @@ static void setup_check(void)
 	}
 ------------
 
+To see how attributes in argv[] are set for different paths, only
+the first step in the above would be different.
+
+------------
+static struct attr_check *check;
+static void setup_check(const char **argv)
+{
+	check = attr_check_alloc();
+	while (*argv) {
+		struct git_attr *attr = git_attr(*argv);
+		attr_check_append(check, attr);
+		argv++;
+	}
+}
+------------
+
 
 Querying All Attributes
 -----------------------
 
 To get the values of all attributes associated with a file:
 
-* Call `git_all_attrs()`, which returns an array of `git_attr_check`
-  structures.
+* Prepare an empty `attr_check` structure by calling
+  `attr_check_alloc()`.
+
+* Call `git_all_attrs()`, which populates the `attr_check`
+  with the attributes attached to the path.
 
-* Iterate over the `git_attr_check` array to examine the attribute
-  names and values.  The name of the attribute described by a
-  `git_attr_check` object can be retrieved via
-  `git_attr_name(check[i].attr)`.  (Please note that no items will be
-  returned for unset attributes, so `ATTR_UNSET()` will return false
-  for all returned `git_array_check` objects.)
+* Iterate over the `attr_check.check[]` array to examine
+  the attribute names and values.  The name of the attribute
+  described by a  `attr_check.check[]` object can be retrieved via
+  `git_attr_name(check->check[i].attr)`.  (Please note that no items
+  will be returned for unset attributes, so `ATTR_UNSET()` will return
+  false for all returned `attr_check.check[]` objects.)
 
-* Free the `git_array_check` array.
+* Free the `attr_check` struct by calling `attr_check_free()`.
diff --git a/attr.c b/attr.c
index d2eaa0410..da727e3fd 100644
--- a/attr.c
+++ b/attr.c
@@ -821,7 +821,8 @@ static void collect_some_attrs(const char *path, int num,
 		rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-int git_check_attrs(const char *path, int num, struct attr_check_item *check)
+static int git_check_attrs(const char *path, int num,
+			   struct attr_check_item *check)
 {
 	int i;
 
diff --git a/attr.h b/attr.h
index 971bb9a38..3db9893ef 100644
--- a/attr.h
+++ b/attr.h
@@ -52,7 +52,6 @@ extern void attr_check_free(struct attr_check *check);
  */
 extern const char *git_attr_name(const struct git_attr *);
 
-int git_check_attrs(const char *path, int, struct attr_check_item *);
 extern int git_check_attr(const char *path, struct attr_check *check);
 
 /*
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 17/27] attr: convert git_check_attrs() callers to use the new API
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, sbeller, pclouds, Brandon Williams
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

From: Junio C Hamano <gitster@pobox.com>

The remaining callers are all simple "I have N attributes I am
interested in.  I'll ask about them with various paths one by one".

After this step, no caller to git_check_attrs() remains.  After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
---
 archive.c              | 24 ++++++------------------
 builtin/pack-objects.c | 19 +++++--------------
 convert.c              | 17 ++++++-----------
 ll-merge.c             | 33 ++++++++++++++-------------------
 userdiff.c             | 19 ++++++++-----------
 ws.c                   | 19 ++++++-------------
 6 files changed, 45 insertions(+), 86 deletions(-)

diff --git a/archive.c b/archive.c
index b76bd4691..3591f7d55 100644
--- a/archive.c
+++ b/archive.c
@@ -87,19 +87,6 @@ void *sha1_file_to_archive(const struct archiver_args *args,
 	return buffer;
 }
 
-static void setup_archive_check(struct attr_check_item *check)
-{
-	static struct git_attr *attr_export_ignore;
-	static struct git_attr *attr_export_subst;
-
-	if (!attr_export_ignore) {
-		attr_export_ignore = git_attr("export-ignore");
-		attr_export_subst = git_attr("export-subst");
-	}
-	check[0].attr = attr_export_ignore;
-	check[1].attr = attr_export_subst;
-}
-
 struct directory {
 	struct directory *up;
 	struct object_id oid;
@@ -120,10 +107,10 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
 		void *context)
 {
 	static struct strbuf path = STRBUF_INIT;
+	static struct attr_check *check;
 	struct archiver_context *c = context;
 	struct archiver_args *args = c->args;
 	write_archive_entry_fn_t write_entry = c->write_entry;
-	struct attr_check_item check[2];
 	const char *path_without_prefix;
 	int err;
 
@@ -137,11 +124,12 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
 		strbuf_addch(&path, '/');
 	path_without_prefix = path.buf + args->baselen;
 
-	setup_archive_check(check);
-	if (!git_check_attrs(path_without_prefix, ARRAY_SIZE(check), check)) {
-		if (ATTR_TRUE(check[0].value))
+	if (!check)
+		check = attr_check_initl("export-ignore", "export-subst", NULL);
+	if (!git_check_attr(path_without_prefix, check)) {
+		if (ATTR_TRUE(check->check[0].value))
 			return 0;
-		args->convert = ATTR_TRUE(check[1].value);
+		args->convert = ATTR_TRUE(check->check[1].value);
 	}
 
 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8b8fbd814..ff8b3c12d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -894,24 +894,15 @@ static void write_pack_file(void)
 			written, nr_result);
 }
 
-static void setup_delta_attr_check(struct attr_check_item *check)
-{
-	static struct git_attr *attr_delta;
-
-	if (!attr_delta)
-		attr_delta = git_attr("delta");
-
-	check[0].attr = attr_delta;
-}
-
 static int no_try_delta(const char *path)
 {
-	struct attr_check_item check[1];
+	static struct attr_check *check;
 
-	setup_delta_attr_check(check);
-	if (git_check_attrs(path, ARRAY_SIZE(check), check))
+	if (!check)
+		check = attr_check_initl("delta", NULL);
+	if (git_check_attr(path, check))
 		return 0;
-	if (ATTR_FALSE(check->value))
+	if (ATTR_FALSE(check->check[0].value))
 		return 1;
 	return 0;
 }
diff --git a/convert.c b/convert.c
index 1b9829279..affd8ce9b 100644
--- a/convert.c
+++ b/convert.c
@@ -1085,24 +1085,19 @@ struct conv_attrs {
 	int ident;
 };
 
-static const char *conv_attr_name[] = {
-	"crlf", "ident", "filter", "eol", "text",
-};
-#define NUM_CONV_ATTRS ARRAY_SIZE(conv_attr_name)
-
 static void convert_attrs(struct conv_attrs *ca, const char *path)
 {
-	int i;
-	static struct attr_check_item ccheck[NUM_CONV_ATTRS];
+	static struct attr_check *check;
 
-	if (!ccheck[0].attr) {
-		for (i = 0; i < NUM_CONV_ATTRS; i++)
-			ccheck[i].attr = git_attr(conv_attr_name[i]);
+	if (!check) {
+		check = attr_check_initl("crlf", "ident", "filter",
+					 "eol", "text", NULL);
 		user_convert_tail = &user_convert;
 		git_config(read_convert_config, NULL);
 	}
 
-	if (!git_check_attrs(path, NUM_CONV_ATTRS, ccheck)) {
+	if (!git_check_attr(path, check)) {
+		struct attr_check_item *ccheck = check->check;
 		ca->crlf_action = git_path_check_crlf(ccheck + 4);
 		if (ca->crlf_action == CRLF_UNDEFINED)
 			ca->crlf_action = git_path_check_crlf(ccheck + 0);
diff --git a/ll-merge.c b/ll-merge.c
index 198f07aca..3a4227a1c 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -336,15 +336,6 @@ static const struct ll_merge_driver *find_ll_merge_driver(const char *merge_attr
 	return &ll_merge_drv[LL_TEXT_MERGE];
 }
 
-static int git_path_check_merge(const char *path, struct attr_check_item check[2])
-{
-	if (!check[0].attr) {
-		check[0].attr = git_attr("merge");
-		check[1].attr = git_attr("conflict-marker-size");
-	}
-	return git_check_attrs(path, 2, check);
-}
-
 static void normalize_file(mmfile_t *mm, const char *path)
 {
 	struct strbuf strbuf = STRBUF_INIT;
@@ -362,7 +353,7 @@ int ll_merge(mmbuffer_t *result_buf,
 	     mmfile_t *theirs, const char *their_label,
 	     const struct ll_merge_options *opts)
 {
-	static struct attr_check_item check[2];
+	static struct attr_check *check;
 	static const struct ll_merge_options default_opts;
 	const char *ll_driver_name = NULL;
 	int marker_size = DEFAULT_CONFLICT_MARKER_SIZE;
@@ -376,10 +367,14 @@ int ll_merge(mmbuffer_t *result_buf,
 		normalize_file(ours, path);
 		normalize_file(theirs, path);
 	}
-	if (!git_path_check_merge(path, check)) {
-		ll_driver_name = check[0].value;
-		if (check[1].value) {
-			marker_size = atoi(check[1].value);
+
+	if (!check)
+		check = attr_check_initl("merge", "conflict-marker-size", NULL);
+
+	if (!git_check_attr(path, check)) {
+		ll_driver_name = check->check[0].value;
+		if (check->check[1].value) {
+			marker_size = atoi(check->check[1].value);
 			if (marker_size <= 0)
 				marker_size = DEFAULT_CONFLICT_MARKER_SIZE;
 		}
@@ -398,13 +393,13 @@ int ll_merge(mmbuffer_t *result_buf,
 
 int ll_merge_marker_size(const char *path)
 {
-	static struct attr_check_item check;
+	static struct attr_check *check;
 	int marker_size = DEFAULT_CONFLICT_MARKER_SIZE;
 
-	if (!check.attr)
-		check.attr = git_attr("conflict-marker-size");
-	if (!git_check_attrs(path, 1, &check) && check.value) {
-		marker_size = atoi(check.value);
+	if (!check)
+		check = attr_check_initl("conflict-marker-size", NULL);
+	if (!git_check_attr(path, check) && check->check[0].value) {
+		marker_size = atoi(check->check[0].value);
 		if (marker_size <= 0)
 			marker_size = DEFAULT_CONFLICT_MARKER_SIZE;
 	}
diff --git a/userdiff.c b/userdiff.c
index b0b44467a..109d4b9fc 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -262,25 +262,22 @@ struct userdiff_driver *userdiff_find_by_name(const char *name) {
 
 struct userdiff_driver *userdiff_find_by_path(const char *path)
 {
-	static struct git_attr *attr;
-	struct attr_check_item check;
-
-	if (!attr)
-		attr = git_attr("diff");
-	check.attr = attr;
+	static struct attr_check *check;
 
+	if (!check)
+		check = attr_check_initl("diff", NULL);
 	if (!path)
 		return NULL;
-	if (git_check_attrs(path, 1, &check))
+	if (git_check_attr(path, check))
 		return NULL;
 
-	if (ATTR_TRUE(check.value))
+	if (ATTR_TRUE(check->check[0].value))
 		return &driver_true;
-	if (ATTR_FALSE(check.value))
+	if (ATTR_FALSE(check->check[0].value))
 		return &driver_false;
-	if (ATTR_UNSET(check.value))
+	if (ATTR_UNSET(check->check[0].value))
 		return NULL;
-	return userdiff_find_by_name(check.value);
+	return userdiff_find_by_name(check->check[0].value);
 }
 
 struct userdiff_driver *userdiff_get_textconv(struct userdiff_driver *driver)
diff --git a/ws.c b/ws.c
index fbd876e84..7556adbd0 100644
--- a/ws.c
+++ b/ws.c
@@ -71,24 +71,17 @@ unsigned parse_whitespace_rule(const char *string)
 	return rule;
 }
 
-static void setup_whitespace_attr_check(struct attr_check_item *check)
-{
-	static struct git_attr *attr_whitespace;
-
-	if (!attr_whitespace)
-		attr_whitespace = git_attr("whitespace");
-	check[0].attr = attr_whitespace;
-}
-
 unsigned whitespace_rule(const char *pathname)
 {
-	struct attr_check_item attr_whitespace_rule;
+	static struct attr_check *attr_whitespace_rule;
+
+	if (!attr_whitespace_rule)
+		attr_whitespace_rule = attr_check_initl("whitespace", NULL);
 
-	setup_whitespace_attr_check(&attr_whitespace_rule);
-	if (!git_check_attrs(pathname, 1, &attr_whitespace_rule)) {
+	if (!git_check_attr(pathname, attr_whitespace_rule)) {
 		const char *value;
 
-		value = attr_whitespace_rule.value;
+		value = attr_whitespace_rule->check[0].value;
 		if (ATTR_TRUE(value)) {
 			/* true (whitespace) */
 			unsigned all_rule = ws_tab_width(whitespace_rule_cfg);
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related

* [PATCH v2 24/27] attr: tighten const correctness with git_attr and match_attr
From: Brandon Williams @ 2017-01-23 20:35 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, gitster, pclouds
In-Reply-To: <20170123203525.185058-1-bmwill@google.com>

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 attr.c               | 14 +++++++-------
 attr.h               |  2 +-
 builtin/check-attr.c |  3 ++-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index ed9ba3756..95456503e 100644
--- a/attr.c
+++ b/attr.c
@@ -209,7 +209,7 @@ static void report_invalid_attr(const char *name, size_t len,
  * dictionary.  If no entry is found, create a new attribute and store it in
  * the dictionary.
  */
-static struct git_attr *git_attr_internal(const char *name, int namelen)
+static const struct git_attr *git_attr_internal(const char *name, int namelen)
 {
 	struct git_attr *a;
 
@@ -233,14 +233,14 @@ static struct git_attr *git_attr_internal(const char *name, int namelen)
 	return a;
 }
 
-struct git_attr *git_attr(const char *name)
+const struct git_attr *git_attr(const char *name)
 {
 	return git_attr_internal(name, strlen(name));
 }
 
 /* What does a matched pattern decide? */
 struct attr_state {
-	struct git_attr *attr;
+	const struct git_attr *attr;
 	const char *setto;
 };
 
@@ -267,7 +267,7 @@ struct pattern {
 struct match_attr {
 	union {
 		struct pattern pat;
-		struct git_attr *attr;
+		const struct git_attr *attr;
 	} u;
 	char is_macro;
 	unsigned num_attr;
@@ -814,7 +814,7 @@ static int fill_one(const char *what, struct attr_check_item *all_attrs,
 	int i;
 
 	for (i = a->num_attr - 1; rem > 0 && i >= 0; i--) {
-		struct git_attr *attr = a->state[i].attr;
+		const struct git_attr *attr = a->state[i].attr;
 		const char **n = &(all_attrs[attr->attr_nr].value);
 		const char *v = a->state[i].setto;
 
@@ -838,7 +838,7 @@ static int fill(const char *path, int pathlen, int basename_offset,
 	const char *base = stk->origin ? stk->origin : "";
 
 	for (i = stk->num_matches - 1; 0 < rem && 0 <= i; i--) {
-		struct match_attr *a = stk->attrs[i];
+		const struct match_attr *a = stk->attrs[i];
 		if (a->is_macro)
 			continue;
 		if (path_matches(path, pathlen, basename_offset,
@@ -988,7 +988,7 @@ struct attr_check *attr_check_initl(const char *one, ...)
 	check->check[0].attr = git_attr(one);
 	va_start(params, one);
 	for (cnt = 1; cnt < check->check_nr; cnt++) {
-		struct git_attr *attr;
+		const struct git_attr *attr;
 		param = va_arg(params, const char *);
 		if (!param)
 			die("BUG: counted %d != ended at %d",
diff --git a/attr.h b/attr.h
index f40524875..9b4dc07d8 100644
--- a/attr.h
+++ b/attr.h
@@ -8,7 +8,7 @@ struct git_attr;
  * Given a string, return the gitattribute object that
  * corresponds to it.
  */
-struct git_attr *git_attr(const char *);
+const struct git_attr *git_attr(const char *);
 
 /* Internal use */
 extern const char git_attr__true[];
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 3d4704be5..cc6caf7ac 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -166,7 +166,8 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	check = attr_check_alloc();
 	if (!all_attrs) {
 		for (i = 0; i < cnt; i++) {
-			struct git_attr *a = git_attr(argv[i]);
+			const struct git_attr *a = git_attr(argv[i]);
+
 			if (!a)
 				return error("%s: not a valid attribute name",
 					     argv[i]);
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox