From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH v2 00/12] revamping git_check_attr() API
Date: Mon, 16 May 2016 14:05:33 -0700 [thread overview]
Message-ID: <20160516210545.6591-1-gitster@pobox.com> (raw)
A common pattern to check N attributes for many paths is to
(1) prepare an array A of N git_attr_check_elem items;
(2) call git_attr() to intern the N attribute names and fill A;
(3) repeatedly call git_check_attrs() for path with N and A;
A look-up for these N attributes for a single path P scans the
entire attr_stack, starting from the .git/info/attributes file and
then .gitattributes file in the directory the path P is in, going
upwards to find .gitattributes file found in parent directories.
An earlier commit 06a604e6 (attr: avoid heavy work when we know the
specified attr is not defined, 2014-12-28) tried to optimize out
this scanning for one trivial special case: when the attribute being
sought is known not to exist, we do not have to scan for it. While
this may be a cheap and effective heuristic, it would not work well
when N is (much) more than 1.
What we would want is a more customized way to skip irrelevant
entries in the attribute stack, and the definition of irrelevance
is tied to the set of attributes passed to git_check_attrs() call,
i.e. the set of attributes being sought. The data necessary for
this optimization needs to live alongside the set of attributes, but
a simple array of git_attr_check_elem simply does not have any place
for that.
Introduce "struct git_attr_check" that contains N, the number of
attributes being sought, and A, the array that holds N
git_attr_check_elem items, and a function git_check_attr() that
takes a path P and this structure as its parameters. This structure
can later be extended to hold extra data necessary for optimization.
Side Note: While in this series, I am not interested in
specifying the exact optimization, it would help readers to give
what is envisioned. Even when the caller is interested in only
a few attributes (say, "diff", "text" and "group-doc"), we'd walk
the attribute entries in various .gitattributes files, perform
the pattern match and collect attributes for the path. An
earlier commit 06a604e6 (attr: avoid heavy work when we know the
specified attr is not defined, 2014-12-28) tried to optimize out
this scanning for one trivial special case: where an asked-for
attribute does not appear anywhere in these files, we know we do
not have to scan at all.
We can do much better. We can scan these attribute entries that
came from .gitattributes files once and mark the ones that affect
one or more of the attributes we know the caller is interested
in. Then we only go through the marked entries. E.g. if the
caller is interested in "diff", "text" and "group-doc"
attributes, we can mark entries that talks about any of "diff",
"text", "group-doc" and "binary" attributes. The last one is
because it is an attribute macro that expands to affect "diff"
and "text" (these attribute are unset by having "binary").
The way git_attr_check() API is structured expects that the
caller expresses the set of attributes it is interested in
upfront, and reuses to ask the same question about many paths,
and it is already optimized for the case where it does so in
in-order recursive descent of a tree hierarchy by having an
attr_stack that reads, pre-parses and caches contents of the
.gitattributes files found in each directory hierarchy. The
cached optimization data that sits in "struct git_attr_check"
would be effective for this expected access pattern.
The optimization data cannot be file-scope static to attr.c,
even though that might be easier to implement, in anticipation
of having more than one "struct git_attr_check" working
alternatingly, e.g. having more than one pathspec with
":(attr=X)" pathspec magic, i.e.
git cmd ":(attr=X)dir/" ":(attr=Y)dir/ectory/"
Each pathspec element is expected to keep its own "struct
git_attr_check" in the custom data to support pathspec magic
to match with attributes, and the optimization data kept there
would survive repeated calls to git_check_attr() for the same
path with different set of attributes.
The patches in the earliest part of the series have been sent to the
list already; there is no substantial change (I think I made a
typofix in the commit log message found by Eric).
commit.c: use strchrnul() to scan for one line
attr.c: use strchrnul() to scan for one line
attr.c: update a stale comment on "struct match_attr"
attr.c: explain the lack of attr-name syntax check in parse_attr()
attr.c: complete a sentence in a comment
attr.c: mark where #if DEBUG ends more clearly
attr.c: simplify macroexpand_one()
Step 8 is to make sure I can catch all the existing callers and
update the API incrementally.
attr: rename function and struct related to checking attributes
Step 9 is the most important one. By updating the API to allow the
callers hold piece of information more than just a simple array of
<attr, value> pair, it paves the way to add data structures that
will help optimizing the lookup. As a demonstration of the new API,
it converts one caller. The immediate effect to the callers is that
they can lose moderate amount of code, but that is not the point of
this change.
attr: (re)introduce git_check_attr() and struct git_attr_check
The two patches that follow converts remaining callers to the enw
API.
attr: convert git_all_attrs() to use "struct git_attr_check"
attr: convert git_check_attrs() callers to use the new API
The last step retires the old one and updates the document.
attr: retire git_check_attrs() API
Documentation/technical/api-gitattributes.txt | 62 ++++++-----
archive.c | 24 ++---
attr.c | 143 +++++++++++++++++++-------
attr.h | 31 ++++--
builtin/check-attr.c | 55 +++++-----
builtin/pack-objects.c | 19 +---
commit.c | 3 +-
convert.c | 26 ++---
ll-merge.c | 33 +++---
userdiff.c | 19 ++--
ws.c | 19 ++--
11 files changed, 236 insertions(+), 198 deletions(-)
--
2.8.2-748-gfb85f76
next reply other threads:[~2016-05-16 21:05 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-16 21:05 Junio C Hamano [this message]
2016-05-16 21:05 ` [PATCH v2 01/12] commit.c: use strchrnul() to scan for one line Junio C Hamano
2016-05-16 23:19 ` Stefan Beller
2016-05-16 23:41 ` Junio C Hamano
2016-05-16 23:46 ` Stefan Beller
2016-05-16 21:05 ` [PATCH v2 02/12] attr.c: " Junio C Hamano
2016-05-16 23:28 ` Stefan Beller
2016-05-16 23:45 ` Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 03/12] attr.c: update a stale comment on "struct match_attr" Junio C Hamano
2016-05-16 23:34 ` Stefan Beller
2016-05-16 23:39 ` Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 04/12] attr.c: explain the lack of attr-name syntax check in parse_attr() Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 05/12] attr.c: complete a sentence in a comment Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 06/12] attr.c: mark where #if DEBUG ends more clearly Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 07/12] attr.c: simplify macroexpand_one() Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 08/12] attr: rename function and struct related to checking attributes Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 09/12] attr: (re)introduce git_check_attr() and struct git_attr_check Junio C Hamano
2016-05-17 4:19 ` Eric Sunshine
2016-05-17 5:06 ` Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 10/12] attr: convert git_all_attrs() to use "struct git_attr_check" Junio C Hamano
2016-05-17 17:00 ` Junio C Hamano
2016-05-17 21:08 ` Junio C Hamano
2016-05-18 16:34 ` Stefan Beller
2016-05-16 21:05 ` [PATCH v2 11/12] attr: convert git_check_attrs() callers to use the new API Junio C Hamano
2016-05-16 21:05 ` [PATCH v2 12/12] attr: retire git_check_attrs() API Junio C Hamano
2016-05-18 16:51 ` [PATCH v2 00/12] revamping git_check_attr() API Stefan Beller
2016-05-18 18:36 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160516210545.6591-1-gitster@pobox.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).