git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
@ 2013-03-18 12:58 Nguyễn Thái Ngọc Duy
  2013-03-18 17:50 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-18 12:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

perf reports memcpy at the the 6th position [1] in "git status -uno"
using index v4, and strbuf_remove() in expand_name_field() accounts
for 25% of that. What we need here is a simple string cut and a
cheaper strbuf_setlen() should be enough. After this change, memcpy
drops down to the 13th position [2] and is dominated by
read_index_from.

[1] before
+     15.74%   git  git                [.] blk_SHA1_Block
+     13.22%   git  [kernel.kallsyms]  [k] link_path_walk
+     10.91%   git  [kernel.kallsyms]  [k] __d_lookup
+      8.17%   git  [kernel.kallsyms]  [k] strncpy_from_user
+      4.75%   git  [kernel.kallsyms]  [k] memcmp
+      2.42%   git  libc-2.11.2.so     [.] memcpy

[2] after
+     16.30%   git  git                [.] blk_SHA1_Block
+     13.43%   git  [kernel.kallsyms]  [k] link_path_walk
+     11.45%   git  [kernel.kallsyms]  [k] __d_lookup
+      8.73%   git  [kernel.kallsyms]  [k] strncpy_from_user
+      5.14%   git  [kernel.kallsyms]  [k] memcmp
+      2.29%   git  [kernel.kallsyms]  [k] do_lookup
+      2.21%   git  libc-2.11.2.so     [.] 0x6daf6
+      1.98%   git  [kernel.kallsyms]  [k] _atomic_dec_and_lock
+      1.98%   git  [kernel.kallsyms]  [k] _raw_spin_lock
+      1.86%   git  [kernel.kallsyms]  [k] acl_permission_check
+      1.61%   git  [kernel.kallsyms]  [k] kmem_cache_free
+      1.59%   git  git                [.] unpack_trees
+      1.47%   git  libc-2.11.2.so     [.] memcpy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I was after something else when I noticed this. Seems like a simple
 and safe change.

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 827ae55..8c443aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1354,7 +1354,7 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
 
 	if (name->len < len)
 		die("malformed name field in the index");
-	strbuf_remove(name, name->len - len, len);
+	strbuf_setlen(name, name->len - len);
 	for (ep = cp; *ep; ep++)
 		; /* find the end */
 	strbuf_add(name, cp, ep - cp);
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
  2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy
@ 2013-03-18 17:50 ` Junio C Hamano
  2013-03-19  1:29   ` Duy Nguyen
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2013-03-18 17:50 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> perf reports memcpy at the the 6th position [1] in "git status -uno"
> using index v4, and strbuf_remove() in expand_name_field() accounts
> for 25% of that. What we need here is a simple string cut and a
> cheaper strbuf_setlen() should be enough.

While it is true that strbuf_remove(&sb, sb.len - trim, trim) is
equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see
any memcpy() in the first place.

strbuf_remove(&sb, sb.len - trim, trim) is turned into
strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it
does these two:

	memmove(sb.buf + (sb.len - trim) + 0,
                sb.buf + sb.len, 0);
	memcpy(sb.buf + (sb.len - trim), NULL, 0);

both of which should be a no-op, no?

There also is this call that has the same "trim at the right end":

    pretty.c:       strbuf_remove(sb, sb->len - trimlen, trimlen);

It almost makes me suggest that it may be a better solution to make
strbuf_remove() more intelligent about such a call pattern _if_
these empty memmove/memcpy are so expensive, perhaps like the
attached.  It could be that strbuf_splice() should be the one that
ought to be more intelligent, but I'll leave it up to you to
benchmark to find out where the best place to optimize is.

 strbuf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 05d0693..12db700 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -179,7 +179,10 @@ void strbuf_insert(struct strbuf *sb, size_t pos, const void *data, size_t len)
 
 void strbuf_remove(struct strbuf *sb, size_t pos, size_t len)
 {
-	strbuf_splice(sb, pos, len, NULL, 0);
+	if (pos + len == sb->len)
+		strbuf_setlen(sb, pos);
+	else
+		strbuf_splice(sb, pos, len, NULL, 0);
 }
 
 void strbuf_add(struct strbuf *sb, const void *data, size_t len)

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] read-cache: avoid memcpy in expand_name_field in index v4
  2013-03-18 17:50 ` Junio C Hamano
@ 2013-03-19  1:29   ` Duy Nguyen
  0 siblings, 0 replies; 3+ messages in thread
From: Duy Nguyen @ 2013-03-19  1:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Mar 19, 2013 at 12:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> While it is true that strbuf_remove(&sb, sb.len - trim, trim) is
> equivalent to strbuf_setlen(&sb, sb.len - trim), I wonder why we see
> any memcpy() in the first place.
>
> strbuf_remove(&sb, sb.len - trim, trim) is turned into
> strbuf_splice(&sb, sb.len - trim, trim, NULL, 0) and then in turn it
> does these two:
>
>         memmove(sb.buf + (sb.len - trim) + 0,
>                 sb.buf + sb.len, 0);
>         memcpy(sb.buf + (sb.len - trim), NULL, 0);
>
> both of which should be a no-op, no?

Apparently my memcpy does not bail out early when the third arg is
zero (glibc 2.11.2 on gentoo, x86). It cares more about memory
alignment. This is the beginning of memcpy:

mov    %edi,%eax
mov    0x4(%esp),%edi
mov    %esi,%edx
mov    0x8(%esp),%esi
mov    %edi,%ecx
xor    %esi,%ecx
and    $0x3,%ecx
mov    0xc(%esp),%ecx
cld
jne    75946 <memcpy+0x56>
cmp    $0x3,%ecx
jbe    75946 <memcpy+0x56>


> There also is this call that has the same "trim at the right end":
>
>     pretty.c:       strbuf_remove(sb, sb->len - trimlen, trimlen);
>
> It almost makes me suggest that it may be a better solution to make
> strbuf_remove() more intelligent about such a call pattern _if_
> these empty memmove/memcpy are so expensive, perhaps like the
> attached.  It could be that strbuf_splice() should be the one that
> ought to be more intelligent, but I'll leave it up to you to
> benchmark to find out where the best place to optimize is.

memcpy is not expensive per-se, but this is (again) webkit, where
expand_name_field (and memcpy) is called ~200k times. At that quantity
I still prefer fixing in the "hot" call site expand_name_field(), and
because strbuf_setlen is an inline function, we make no extra calls.

Making strbuf_remove/strbuf_splice more intelligent may be good (or
may make it harder to read), I don't know. But I think it could be a
separate topic.
-- 
Duy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-03-19  1:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-18 12:58 [PATCH] read-cache: avoid memcpy in expand_name_field in index v4 Nguyễn Thái Ngọc Duy
2013-03-18 17:50 ` Junio C Hamano
2013-03-19  1:29   ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).