From: Thomas Rast <trast@student.ethz.ch>
To: Jeff King <peff@peff.net>
Cc: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Eric Herman" <eric@freesa.org>,
git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2 0/3] grep multithreading and scaling
Date: Mon, 5 Dec 2011 10:38:16 +0100 [thread overview]
Message-ID: <201112051038.16423.trast@student.ethz.ch> (raw)
In-Reply-To: <20111202173400.GC23447@sigill.intra.peff.net>
Jeff King wrote:
>
> A quick perf run shows most of the time is spent inflating objects. The
> diff code has a sneaky trick to re-use worktree files when we know they
> are stat-clean (in diff's case it is to avoid writing a tempfile). I
> wonder if we should use the same trick here.
>
> It would hurt the cold cache case, though, as the compressed versions
> require fewer disk accesses, of course.
I just found out that on Linux, there's mincore() that can tell us
(racily, but who cares) whether a given file mapping is in memory. If
you would like to try it, see the source at the end, but I'm getting
things such as
# in a random collection of files, none of which I have accessed lately
$ ls -l
-rw-r--r-- 1 thomas users 116534 Jul 4 2010 IMG_4884.JPG
-rw-r--r-- 1 thomas users 7278081 Aug 25 2010 remoteserverrepo.zip
$ ./mincore IMG_4884.JPG
00000000000000000000000000000
$ cat IMG_4884.JPG > /dev/null
$ ./mincore IMG_4884.JPG
11111111111111111111111111111
$ ./mincore remoteserverrepo.zip
0000000000000000000000[...]
$ head -10 remoteserverrepo.zip >/dev/null
$ ./mincore remoteserverrepo.zip
1111000000000000000000[...]
So that looks fairly promising, and the order would then be:
- if stat-clean, and we have mincore(), and it tells us we can do it
cheaply: grab file from tree
- if it's a loose object: decompress it
- if stat-clean: grab file from tree
- access packs as usual
> PS I suspect your timings are somewhat affected by the simplicity of the
> regex you are asking for. The time to inflate the blobs dominates,
> because the search is just a memmem(). On my quad-core w/
> hyperthreading (i.e., 8 apparent cores):
>
> $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
> 0.42user 0.45system 0:00.15elapsed 578%CPU
> $ /usr/bin/time git grep 'a.*b' >/dev/null
> 14.68user 0.50system 0:02.00elapsed 758%CPU
> $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
> 7.64user 0.41system 0:07.61elapsed 105%CPU
> $ /usr/bin/time git grep --cached 'a.*b' >/dev/null
> 23.46user 0.47system 0:08.42elapsed 284%CPU
>
> So I think there is value in parallelizing even --cached greps. But
> we could do so much better if blob inflation could be done in
> parallel.
Ok, I see, I missed that part. Perhaps the heuristic should then be
"if the regex boils down to memmem, disable threading", but let's see
what loose object decompression in parallel can give us.
---- 8< ---- mincore.c ---- 8< ----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
void die(const char *s)
{
perror(s);
exit(1);
}
int main (int argc, char *argv[])
{
void *mem;
size_t len;
struct stat st;
int fd;
unsigned char *vec;
int vsize;
int i;
size_t page = sysconf(_SC_PAGESIZE);
if (argc != 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
exit(2);
}
fd = open(argv[1], O_RDONLY);
if (fd == -1)
die("open failed");
if (fstat(fd, &st) == -1)
die("fstat failed");
mem = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
if (mem == (void*) -1)
die("mmap failed");
vsize = (st.st_size+page-1)/page;
vec = malloc(vsize);
if (!vec)
die("malloc failed");
if (mincore(mem, st.st_size, vec) == -1)
die("mincore failed");
for (i = 0; i < vsize; i++)
printf("%d", (int) vec[i]);
printf("\n");
return 0;
}
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2011-12-05 9:38 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast
2011-11-25 16:32 ` René Scharfe
2011-11-26 12:15 ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
2011-11-29 9:54 ` Thomas Rast
2011-11-29 13:49 ` René Scharfe
2011-11-29 14:07 ` Thomas Rast
2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-02 13:07 ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
2011-12-02 16:15 ` René Scharfe
2011-12-05 9:02 ` Thomas Rast
2011-12-06 22:48 ` René Scharfe
2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
2011-12-07 4:42 ` Jeff King
2011-12-07 17:11 ` René Scharfe
2011-12-07 18:28 ` Jeff King
2011-12-07 20:11 ` J. Bruce Fields
2011-12-07 20:45 ` Jeff King
2011-12-07 8:12 ` Thomas Rast
2011-12-07 17:00 ` René Scharfe
2011-12-10 13:13 ` Pete Wyckoff
2011-12-12 22:37 ` René Scharfe
2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
2011-12-07 16:52 ` René Scharfe
2011-12-07 18:10 ` Jeff King
2011-12-07 8:11 ` Thomas Rast
2011-12-07 16:54 ` René Scharfe
2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-16 8:22 ` Johannes Sixt
2011-12-16 17:34 ` Junio C Hamano
2011-12-12 21:16 ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
2011-12-12 22:37 ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
2011-12-12 23:44 ` Junio C Hamano
2011-12-13 8:44 ` Thomas Rast
2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
2011-12-23 22:49 ` Thomas Rast
2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason
2011-12-24 7:07 ` Jeff King
2011-12-24 10:49 ` Nguyen Thai Ngoc Duy
2011-12-24 10:55 ` Nguyen Thai Ngoc Duy
2011-12-24 13:38 ` Jeff King
2011-12-25 3:32 ` Nguyen Thai Ngoc Duy
2011-12-02 17:34 ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
2011-12-05 9:38 ` Thomas Rast [this message]
2011-12-05 20:16 ` Thomas Rast
2011-12-06 0:40 ` Jeff King
2011-12-02 20:02 ` Eric Herman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201112051038.16423.trast@student.ethz.ch \
--to=trast@student.ethz.ch \
--cc=eric@freesa.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=rene.scharfe@lsrfire.ath.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.