From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH/RFC 4/4] attr: avoid heavy work when we know the specified attr is not defined
Date: Tue, 9 Dec 2014 20:53:25 +0700 [thread overview]
Message-ID: <1418133205-18213-5-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1418133205-18213-1-git-send-email-pclouds@gmail.com>
If we have never seen attr 'X' in any .gitattributes file we have
examined so far, we can be sure that 'X' is not defined. So no need to
go over all the attr stack to look for attr 'X'. This is the purpose
behind this new field maybe_real.
This optimization breaks down if macros are involved because we can't
know for sure what macro would expand to 'X' at attr parsing time. But
if we go the permisstic way and assume all macros are expanded, we hit
the builtin "binary" macro. At least the "diff" attr defined in this
macro will disable this optimization for git-grep. So we wait until
any attr lines _may_ reference to a macro before we turn this off.
In git.git, this reduces the number of fill_one() call for "git grep
abcdefghi" from ~5300 to 3000. The optimization stops when it reads
t/.gitattributes, which uses 'binary' macro.
"git grep" is actually a good example to justify this patch. The
command checks "diff" attribute on every file. People usually don't
define this attribute. But they pay the attr lookup penalty anyway
without this patch, proportional to the number of attr lines they have
in repo.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
attr.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/attr.c b/attr.c
index 4ec6186..ba41761 100644
--- a/attr.c
+++ b/attr.c
@@ -33,9 +33,11 @@ struct git_attr {
unsigned h;
int attr_nr;
int maybe_macro;
+ int maybe_real;
char name[FLEX_ARRAY];
};
static int git_attr_nr;
+static int cannot_trust_maybe_real;
static struct git_attr_check *check_all_attr;
static struct git_attr *(git_attr_hash[HASHSIZE]);
@@ -97,6 +99,7 @@ static struct git_attr *git_attr_internal(const char *name, int len)
a->next = git_attr_hash[pos];
a->attr_nr = git_attr_nr++;
a->maybe_macro = 0;
+ a->maybe_real = 0;
git_attr_hash[pos] = a;
REALLOC_ARRAY(check_all_attr, git_attr_nr);
@@ -269,6 +272,10 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
/* Second pass to fill the attr_states */
for (cp = states, i = 0; *cp; i++) {
cp = parse_attr(src, lineno, cp, &(res->state[i]));
+ if (!is_macro)
+ res->state[i].attr->maybe_real = 1;
+ if (res->state[i].attr->maybe_macro)
+ cannot_trust_maybe_real = 1;
}
return res;
@@ -752,11 +759,46 @@ static void collect_all_attrs(const char *path)
rem = fill(path, pathlen, basename_offset, stk, rem);
}
+static void collect_selected_attrs(const char *path, int num,
+ struct git_attr_check *check)
+{
+ struct attr_stack *stk;
+ int i, pathlen, rem, dirlen;
+ int basename_offset;
+
+ pathlen = split_path(path, &dirlen, &basename_offset);
+ prepare_attr_stack(path, dirlen);
+ if (cannot_trust_maybe_real) {
+ for (i = 0; i < git_attr_nr; i++)
+ check_all_attr[i].value = ATTR__UNKNOWN;
+ } else {
+ rem = num;
+ for (i = 0; i < num; i++) {
+ struct git_attr_check *c;
+ c = check_all_attr + check[i].attr->attr_nr;
+ if (check[i].attr->maybe_real)
+ c->value = ATTR__UNKNOWN;
+ else {
+ c->value = ATTR__UNSET;
+ rem--;
+ }
+ }
+ if (!rem)
+ return;
+ }
+ rem = git_attr_nr;
+ for (stk = attr_stack; 0 < rem && stk; stk = stk->prev)
+ rem = fill(path, pathlen, basename_offset, stk, rem);
+}
+
int git_check_attr(const char *path, int num, struct git_attr_check *check)
{
int i;
- collect_all_attrs(path);
+ if (cannot_trust_maybe_real)
+ collect_all_attrs(path);
+ else
+ collect_selected_attrs(path, num, check);
for (i = 0; i < num; i++) {
const char *value = check_all_attr[check[i].attr->attr_nr].value;
--
2.2.0.84.ge9c7a8a
next prev parent reply other threads:[~2014-12-09 13:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-09 13:53 [PATCH/RFC 0/4] some attr optimizations Nguyễn Thái Ngọc Duy
2014-12-09 13:53 ` [PATCH 1/4] attr.c: rename global var attr_nr to git_attr_nr Nguyễn Thái Ngọc Duy
2014-12-09 23:54 ` Junio C Hamano
2014-12-09 13:53 ` [PATCH 2/4] attr.c: split path processing code out of collect_all_attrs() Nguyễn Thái Ngọc Duy
2014-12-09 13:53 ` [PATCH/RFC 3/4] attr: do not attempt to expand when we know it's not a macro Nguyễn Thái Ngọc Duy
2014-12-09 23:27 ` Eric Sunshine
2014-12-09 23:56 ` Junio C Hamano
2014-12-09 13:53 ` Nguyễn Thái Ngọc Duy [this message]
2014-12-10 0:18 ` [PATCH/RFC 4/4] attr: avoid heavy work when we know the specified attr is not defined Junio C Hamano
2014-12-15 0:50 ` Duy Nguyen
2014-12-15 17:30 ` Junio C Hamano
2014-12-27 23:39 ` [PATCH v2 0/3] some attr optimizations Nguyễn Thái Ngọc Duy
2014-12-27 23:39 ` [PATCH v2 1/3] attr.c: rename arg name attr_nr to avoid shadowing the global one Nguyễn Thái Ngọc Duy
2014-12-27 23:39 ` [PATCH v2 2/3] attr: do not attempt to expand when we know it's not a macro Nguyễn Thái Ngọc Duy
2014-12-27 23:59 ` Eric Sunshine
2014-12-27 23:39 ` [PATCH v2 3/3] attr: avoid heavy work when we know the specified attr is not defined Nguyễn Thái Ngọc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1418133205-18213-5-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).