* [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 @ 2013-03-18 12:58 Nguyễn Thái Ngọc Duy 2013-03-18 17:50 ` Junio C Hamano 0 siblings, 1 reply; 3+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-18 12:58 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy perf reports memcpy at the the 6th position [1] in "git status -uno" using index v4, and strbuf_remove() in expand_name_field() accounts for 25% of that. What we need here is a simple string cut and a cheaper strbuf_setlen() should be enough. After this change, memcpy drops down to the 13th position [2] and is dominated by read_index_from. [1] before + 15.74% git git [.] blk_SHA1_Block + 13.22% git [kernel.kallsyms] [k] link_path_walk + 10.91% git [kernel.kallsyms] [k] __d_lookup + 8.17% git [kernel.kallsyms] [k] strncpy_from_user + 4.75% git [kernel.kallsyms] [k] memcmp + 2.42% git libc-2.11.2.so [.] memcpy [2] after + 16.30% git git [.] blk_SHA1_Block + 13.43% git [kernel.kallsyms] [k] link_path_walk + 11.45% git [kernel.kallsyms] [k] __d_lookup + 8.73% git [kernel.kallsyms] [k] strncpy_from_user + 5.14% git [kernel.kallsyms] [k] memcmp + 2.29% git [kernel.kallsyms] [k] do_lookup + 2.21% git libc-2.11.2.so [.] 0x6daf6 + 1.98% git [kernel.kallsyms] [k] _atomic_dec_and_lock + 1.98% git [kernel.kallsyms] [k] _raw_spin_lock + 1.86% git [kernel.kallsyms] [k] acl_permission_check + 1.61% git [kernel.kallsyms] [k] kmem_cache_free + 1.59% git git [.] unpack_trees + 1.47% git libc-2.11.2.so [.] memcpy Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- I was after something else when I noticed this. Seems like a simple and safe change. read-cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index 827ae55..8c443aa 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1354,7 +1354,7 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_) if (name->len < len) die("malformed name field in the index"); - strbuf_remove(name, name->len - len, len); + strbuf_setlen(name, name->len - len); for (ep = cp; *ep; ep++) ; /* find the end */ strbuf_add(name, cp, ep - cp); -- 1.8.2.83.gc99314b ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy @ 2013-03-18 17:50 ` Junio C Hamano 2013-03-19 1:29 ` Duy Nguyen 0 siblings, 1 reply; 3+ messages in thread From: Junio C Hamano @ 2013-03-18 17:50 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > perf reports memcpy at the the 6th position [1] in "git status -uno" > using index v4, and strbuf_remove() in expand_name_field() accounts > for 25% of that. What we need here is a simple string cut and a > cheaper strbuf_setlen() should be enough. While it is true that strbuf_remove(&sb, sb.len - trim, trim) is equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see any memcpy() in the first place. strbuf_remove(&sb, sb.len - trim, trim) is turned into strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it does these two: memmove(sb.buf + (sb.len - trim) + 0, sb.buf + sb.len, 0); memcpy(sb.buf + (sb.len - trim), NULL, 0); both of which should be a no-op, no? There also is this call that has the same "trim at the right end": pretty.c: strbuf_remove(sb, sb->len - trimlen, trimlen); It almost makes me suggest that it may be a better solution to make strbuf_remove() more intelligent about such a call pattern _if_ these empty memmove/memcpy are so expensive, perhaps like the attached. It could be that strbuf_splice() should be the one that ought to be more intelligent, but I'll leave it up to you to benchmark to find out where the best place to optimize is. strbuf.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 05d0693..12db700 100644 --- a/strbuf.c +++ b/strbuf.c @@ -179,7 +179,10 @@ void strbuf_insert(struct strbuf *sb, size_t pos, const void *data, size_t len) void strbuf_remove(struct strbuf *sb, size_t pos, size_t len) { - strbuf_splice(sb, pos, len, NULL, 0); + if (pos + len == sb->len) + strbuf_setlen(sb, pos); + else + strbuf_splice(sb, pos, len, NULL, 0); } void strbuf_add(struct strbuf *sb, const void *data, size_t len) ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 2013-03-18 17:50 ` Junio C Hamano @ 2013-03-19 1:29 ` Duy Nguyen 0 siblings, 0 replies; 3+ messages in thread From: Duy Nguyen @ 2013-03-19 1:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, Mar 19, 2013 at 12:50 AM, Junio C Hamano <gitster@pobox.com> wrote: > While it is true that strbuf_remove(&sb, sb.len - trim, trim) is > equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see > any memcpy() in the first place. > > strbuf_remove(&sb, sb.len - trim, trim) is turned into > strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it > does these two: > > memmove(sb.buf + (sb.len - trim) + 0, > sb.buf + sb.len, 0); > memcpy(sb.buf + (sb.len - trim), NULL, 0); > > both of which should be a no-op, no? Apparently my memcpy does not bail out early when the third arg is zero (glibc 2.11.2 on gentoo, x86). It cares more about memory alignment. This is the beginning of memcpy: mov %edi,%eax mov 0x4(%esp),%edi mov %esi,%edx mov 0x8(%esp),%esi mov %edi,%ecx xor %esi,%ecx and $0x3,%ecx mov 0xc(%esp),%ecx cld jne 75946 <memcpy+0x56> cmp $0x3,%ecx jbe 75946 <memcpy+0x56> > There also is this call that has the same "trim at the right end": > > pretty.c: strbuf_remove(sb, sb->len - trimlen, trimlen); > > It almost makes me suggest that it may be a better solution to make > strbuf_remove() more intelligent about such a call pattern _if_ > these empty memmove/memcpy are so expensive, perhaps like the > attached. It could be that strbuf_splice() should be the one that > ought to be more intelligent, but I'll leave it up to you to > benchmark to find out where the best place to optimize is. memcpy is not expensive per-se, but this is (again) webkit, where expand_name_field (and memcpy) is called ~200k times. At that quantity I still prefer fixing in the "hot" call site expand_name_field(), and because strbuf_setlen is an inline function, we make no extra calls. Making strbuf_remove/strbuf_splice more intelligent may be good (or may make it harder to read), I don't know. But I think it could be a separate topic. -- Duy ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-03-19 1:30 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy 2013-03-18 17:50 ` Junio C Hamano 2013-03-19 1:29 ` Duy Nguyen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).