* [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
@ 2013-03-18 12:58 Nguyễn Thái Ngọc Duy
2013-03-18 17:50 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-18 12:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
perf reports memcpy at the the 6th position [1] in "git status -uno"
using index v4, and strbuf_remove() in expand_name_field() accounts
for 25% of that. What we need here is a simple string cut and a
cheaper strbuf_setlen() should be enough. After this change, memcpy
drops down to the 13th position [2] and is dominated by
read_index_from.
[1] before
+ 15.74% git git [.] blk_SHA1_Block
+ 13.22% git [kernel.kallsyms] [k] link_path_walk
+ 10.91% git [kernel.kallsyms] [k] __d_lookup
+ 8.17% git [kernel.kallsyms] [k] strncpy_from_user
+ 4.75% git [kernel.kallsyms] [k] memcmp
+ 2.42% git libc-2.11.2.so [.] memcpy
[2] after
+ 16.30% git git [.] blk_SHA1_Block
+ 13.43% git [kernel.kallsyms] [k] link_path_walk
+ 11.45% git [kernel.kallsyms] [k] __d_lookup
+ 8.73% git [kernel.kallsyms] [k] strncpy_from_user
+ 5.14% git [kernel.kallsyms] [k] memcmp
+ 2.29% git [kernel.kallsyms] [k] do_lookup
+ 2.21% git libc-2.11.2.so [.] 0x6daf6
+ 1.98% git [kernel.kallsyms] [k] _atomic_dec_and_lock
+ 1.98% git [kernel.kallsyms] [k] _raw_spin_lock
+ 1.86% git [kernel.kallsyms] [k] acl_permission_check
+ 1.61% git [kernel.kallsyms] [k] kmem_cache_free
+ 1.59% git git [.] unpack_trees
+ 1.47% git libc-2.11.2.so [.] memcpy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
I was after something else when I noticed this. Seems like a simple
and safe change.
read-cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/read-cache.c b/read-cache.c
index 827ae55..8c443aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1354,7 +1354,7 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
if (name->len < len)
die("malformed name field in the index");
- strbuf_remove(name, name->len - len, len);
+ strbuf_setlen(name, name->len - len);
for (ep = cp; *ep; ep++)
; /* find the end */
strbuf_add(name, cp, ep - cp);
--
1.8.2.83.gc99314b
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy
@ 2013-03-18 17:50 ` Junio C Hamano
2013-03-19 1:29 ` Duy Nguyen
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2013-03-18 17:50 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> perf reports memcpy at the the 6th position [1] in "git status -uno"
> using index v4, and strbuf_remove() in expand_name_field() accounts
> for 25% of that. What we need here is a simple string cut and a
> cheaper strbuf_setlen() should be enough.
While it is true that strbuf_remove(&sb, sb.len - trim, trim) is
equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see
any memcpy() in the first place.
strbuf_remove(&sb, sb.len - trim, trim) is turned into
strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it
does these two:
memmove(sb.buf + (sb.len - trim) + 0,
sb.buf + sb.len, 0);
memcpy(sb.buf + (sb.len - trim), NULL, 0);
both of which should be a no-op, no?
There also is this call that has the same "trim at the right end":
pretty.c: strbuf_remove(sb, sb->len - trimlen, trimlen);
It almost makes me suggest that it may be a better solution to make
strbuf_remove() more intelligent about such a call pattern _if_
these empty memmove/memcpy are so expensive, perhaps like the
attached. It could be that strbuf_splice() should be the one that
ought to be more intelligent, but I'll leave it up to you to
benchmark to find out where the best place to optimize is.
strbuf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/strbuf.c b/strbuf.c
index 05d0693..12db700 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -179,7 +179,10 @@ void strbuf_insert(struct strbuf *sb, size_t pos, const void *data, size_t len)
void strbuf_remove(struct strbuf *sb, size_t pos, size_t len)
{
- strbuf_splice(sb, pos, len, NULL, 0);
+ if (pos + len == sb->len)
+ strbuf_setlen(sb, pos);
+ else
+ strbuf_splice(sb, pos, len, NULL, 0);
}
void strbuf_add(struct strbuf *sb, const void *data, size_t len)
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
2013-03-18 17:50 ` Junio C Hamano
@ 2013-03-19 1:29 ` Duy Nguyen
0 siblings, 0 replies; 3+ messages in thread
From: Duy Nguyen @ 2013-03-19 1:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, Mar 19, 2013 at 12:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> While it is true that strbuf_remove(&sb, sb.len - trim, trim) is
> equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see
> any memcpy() in the first place.
>
> strbuf_remove(&sb, sb.len - trim, trim) is turned into
> strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it
> does these two:
>
> memmove(sb.buf + (sb.len - trim) + 0,
> sb.buf + sb.len, 0);
> memcpy(sb.buf + (sb.len - trim), NULL, 0);
>
> both of which should be a no-op, no?
Apparently my memcpy does not bail out early when the third arg is
zero (glibc 2.11.2 on gentoo, x86). It cares more about memory
alignment. This is the beginning of memcpy:
mov %edi,%eax
mov 0x4(%esp),%edi
mov %esi,%edx
mov 0x8(%esp),%esi
mov %edi,%ecx
xor %esi,%ecx
and $0x3,%ecx
mov 0xc(%esp),%ecx
cld
jne 75946 <memcpy+0x56>
cmp $0x3,%ecx
jbe 75946 <memcpy+0x56>
> There also is this call that has the same "trim at the right end":
>
> pretty.c: strbuf_remove(sb, sb->len - trimlen, trimlen);
>
> It almost makes me suggest that it may be a better solution to make
> strbuf_remove() more intelligent about such a call pattern _if_
> these empty memmove/memcpy are so expensive, perhaps like the
> attached. It could be that strbuf_splice() should be the one that
> ought to be more intelligent, but I'll leave it up to you to
> benchmark to find out where the best place to optimize is.
memcpy is not expensive per-se, but this is (again) webkit, where
expand_name_field (and memcpy) is called ~200k times. At that quantity
I still prefer fixing in the "hot" call site expand_name_field(), and
because strbuf_setlen is an inline function, we make no extra calls.
Making strbuf_remove/strbuf_splice more intelligent may be good (or
may make it harder to read), I don't know. But I think it could be a
separate topic.
--
Duy
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-03-19 1:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy
2013-03-18 17:50 ` Junio C Hamano
2013-03-19 1:29 ` Duy Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).