From: Fredrik Kuivinen <frekui@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Johannes Sixt <j.sixt@viscovery.net>
Subject: Re: [PATCH v4] Threaded grep
Date: Tue, 26 Jan 2010 13:10:50 +0100 [thread overview]
Message-ID: <4c8ef71001260410l2afd2dbx17b6e216bd9e5d8@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1001251542100.3574@localhost.localdomain>
On Tue, Jan 26, 2010 at 00:59, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> The profile for the threaded case says:
>
> 51.73% git libc-2.11.1.so [.] re_search_internal
> 11.47% git [kernel] [k] copy_user_generic_string
> 2.90% git libc-2.11.1.so [.] __strlen_sse2
> 2.66% git [kernel] [k] link_path_walk
> 2.55% git [kernel] [k] intel_pmu_enable_all
> 2.40% git [kernel] [k] __d_lookup
> 1.71% git libc-2.11.1.so [.] __GI___libc_malloc
> 1.55% git [kernel] [k] _raw_spin_lock
> 1.43% git [kernel] [k] sys_futex
> 1.30% git libc-2.11.1.so [.] __cfree
> 1.28% git [kernel] [k] intel_pmu_disable_all
> 1.25% git libc-2.11.1.so [.] __GI_memchr
> 1.14% git libc-2.11.1.so [.] _int_malloc
> 1.02% git [kernel] [k] effective_load
>
> and the only thing that makes me go "eh?" there is the strlen(). Why is
> that so hot? But locking doesn't seem to be the biggest issue, and in
> general I think this is all pretty good. The 'effective_load' thing is the
> scheduler, so there's certainly some context switching going on, probably
> still due to excessive synchronization, but it's equally clear that that
> is certainly not a dominant factor.
I see the strlen in my profiles as well, but I haven't figured out
where it comes from. I get the following:
51.16% git-grep /lib/tls/i686/cmov/libc-2.10.1.so
[.] 0x000000000b14c6
10.12% git-grep /lib/tls/i686/cmov/libc-2.10.1.so
[.] __GI_strlen
9.27% git-grep [kernel]
[k] __copy_to_user_ll
4.68% git-grep /lib/tls/i686/cmov/libc-2.10.1.so
[.] __memchr
1.72% git-grep [kernel]
[k] __d_lookup
1.18% git-grep /lib/i686/cmov/libcrypto.so.0.9.8
[.] sha1_block_asm_data_order
1.11% git-grep [kernel]
[k] __ticket_spin_lock
0.84% git-grep [vdso]
[.] 0x00000000b6c422
If I use perf record -g I get
10.39% git-grep /lib/tls/i686/cmov/libc-2.10.1.so
[.] __GI_strlen
|
|--99.05%-- look_ahead
| grep_buffer_1
| grep_buffer
| run
| start_thread
| __clone
|
|--0.64%-- grep_file
| grep_cache
| cmd_grep
| run_builtin
| handle_internal_command
| main
| __libc_start_main
| 0x804ae81
--0.32%-- [...]
This doesn't make much sense to me as look_ahead doesn't call strlen
(I compiled git with -O0 to avoid any issues with inlined functions).
But I haven't used perf so much, so maybe I'm reading the output the
wrong way.
> One potentially interesting data point is that if I make NR_THREADS be 16,
> performance goes down, and I get more locking overhead. So NR_THREADS of 8
> works well on this machine.
Interesting. I get the best results with 8 threads as well, but I only
have two cores.
> One worry is, of course, whether all regex() implementations are
> thread-safe. Maybe there are broken libraries that have hidden global
> state in them?
That would certainly be a problem. A quick google search didn't show
any known bugs. Of course, this doesn't tell us anything about the
unknown ones.
- Fredrik
next prev parent reply other threads:[~2010-01-26 12:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-25 22:51 [PATCH v4] Threaded grep Fredrik Kuivinen
2010-01-25 23:59 ` Linus Torvalds
2010-01-26 12:10 ` Fredrik Kuivinen [this message]
2010-01-26 15:28 ` Linus Torvalds
2010-01-26 16:30 ` Benjamin Kramer
2010-01-26 16:44 ` Linus Torvalds
2010-01-26 16:56 ` Linus Torvalds
2010-01-26 17:19 ` Mike Hommey
2010-01-26 17:48 ` [PATCH] grep: use REG_STARTEND (if available) to speed up regexec Benjamin Kramer
2010-01-26 1:20 ` [PATCH v4] Threaded grep Junio C Hamano
2010-01-26 11:43 ` Fredrik Kuivinen
2010-01-26 17:21 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c8ef71001260410l2afd2dbx17b6e216bd9e5d8@mail.gmail.com \
--to=frekui@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j.sixt@viscovery.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).