git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Jeff King <peff@peff.net>
Cc: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
	"Eric Herman" <eric@freesa.org>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2 0/3] grep multithreading and scaling
Date: Mon, 5 Dec 2011 10:38:16 +0100	[thread overview]
Message-ID: <201112051038.16423.trast@student.ethz.ch> (raw)
In-Reply-To: <20111202173400.GC23447@sigill.intra.peff.net>

Jeff King wrote:
> 
> A quick perf run shows most of the time is spent inflating objects. The
> diff code has a sneaky trick to re-use worktree files when we know they
> are stat-clean (in diff's case it is to avoid writing a tempfile). I
> wonder if we should use the same trick here.
> 
> It would hurt the cold cache case, though, as the compressed versions
> require fewer disk accesses, of course.

I just found out that on Linux, there's mincore() that can tell us
(racily, but who cares) whether a given file mapping is in memory.  If
you would like to try it, see the source at the end, but I'm getting
things such as

  # in a random collection of files, none of which I have accessed lately
  $ ls -l
  -rw-r--r-- 1 thomas users    116534 Jul  4  2010 IMG_4884.JPG
  -rw-r--r-- 1 thomas users   7278081 Aug 25  2010 remoteserverrepo.zip
  $ ./mincore IMG_4884.JPG 
  00000000000000000000000000000
  $ cat IMG_4884.JPG > /dev/null 
  $ ./mincore IMG_4884.JPG 
  11111111111111111111111111111
  $ ./mincore remoteserverrepo.zip 
  0000000000000000000000[...]
  $ head -10 remoteserverrepo.zip >/dev/null
  $ ./mincore remoteserverrepo.zip 
  1111000000000000000000[...]

So that looks fairly promising, and the order would then be:

- if stat-clean, and we have mincore(), and it tells us we can do it
  cheaply: grab file from tree

- if it's a loose object: decompress it

- if stat-clean: grab file from tree

- access packs as usual

> PS I suspect your timings are somewhat affected by the simplicity of the
>    regex you are asking for. The time to inflate the blobs dominates,
>    because the search is just a memmem(). On my quad-core w/
>    hyperthreading (i.e., 8 apparent cores):
> 
>    $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
>    0.42user 0.45system 0:00.15elapsed 578%CPU
>    $ /usr/bin/time git grep 'a.*b' >/dev/null
>    14.68user 0.50system 0:02.00elapsed 758%CPU
>    $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
>    7.64user 0.41system 0:07.61elapsed 105%CPU
>    $ /usr/bin/time git grep --cached 'a.*b' >/dev/null
>    23.46user 0.47system 0:08.42elapsed 284%CPU
> 
>    So I think there is value in parallelizing even --cached greps. But
>    we could do so much better if blob inflation could be done in
>    parallel.

Ok, I see, I missed that part.  Perhaps the heuristic should then be
"if the regex boils down to memmem, disable threading", but let's see
what loose object decompression in parallel can give us.


---- 8< ---- mincore.c ---- 8< ----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>

void die(const char *s)
{
	perror(s);
	exit(1);
}

int main (int argc, char *argv[])
{
	void *mem;
	size_t len;
	struct stat st;
	int fd;
	unsigned char *vec;
	int vsize;
	int i;
	size_t page = sysconf(_SC_PAGESIZE);

	if (argc != 2) {
		fprintf(stderr, "usage: %s <file>\n", argv[0]);
		exit(2);
	}

	fd = open(argv[1], O_RDONLY);
	if (fd == -1)
		die("open failed");
	if (fstat(fd, &st) == -1)
		die("fstat failed");
	mem = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
	if (mem == (void*) -1)
		die("mmap failed");

	vsize = (st.st_size+page-1)/page;
	vec = malloc(vsize);
	if (!vec)
		die("malloc failed");
	if (mincore(mem, st.st_size, vec) == -1)
		die("mincore failed");
	for (i = 0; i < vsize; i++)
		printf("%d", (int) vec[i]);
	printf("\n");
	return 0;
}


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2011-12-05  9:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast
2011-11-25 16:32 ` René Scharfe
2011-11-26 12:15   ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
2011-11-29  9:54     ` Thomas Rast
2011-11-29 13:49       ` René Scharfe
2011-11-29 14:07         ` Thomas Rast
2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
2011-12-02 13:07             ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-02 13:07             ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
2011-12-02 16:15               ` René Scharfe
2011-12-05  9:02                 ` Thomas Rast
2011-12-06 22:48                 ` René Scharfe
2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
2011-12-07  4:42                     ` Jeff King
2011-12-07 17:11                       ` René Scharfe
2011-12-07 18:28                         ` Jeff King
2011-12-07 20:11                       ` J. Bruce Fields
2011-12-07 20:45                         ` Jeff King
2011-12-07  8:12                     ` Thomas Rast
2011-12-07 17:00                       ` René Scharfe
2011-12-10 13:13                         ` Pete Wyckoff
2011-12-12 22:37                           ` René Scharfe
2011-12-07  4:24                   ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
2011-12-07 16:52                     ` René Scharfe
2011-12-07 18:10                       ` Jeff King
2011-12-07  8:11                   ` Thomas Rast
2011-12-07 16:54                     ` René Scharfe
2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
2011-12-12 21:16                   ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-12 21:16                   ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-16  8:22                     ` Johannes Sixt
2011-12-16 17:34                       ` Junio C Hamano
2011-12-12 21:16                   ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
2011-12-12 22:37                   ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
2011-12-12 23:44                   ` Junio C Hamano
2011-12-13  8:44                     ` Thomas Rast
2011-12-23 22:37               ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
2011-12-23 22:49                 ` Thomas Rast
2011-12-24  1:39                   ` Ævar Arnfjörð Bjarmason
2011-12-24  7:07                     ` Jeff King
2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
2011-12-24 13:38                         ` Jeff King
2011-12-25  3:32                       ` Nguyen Thai Ngoc Duy
2011-12-02 17:34             ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
2011-12-05  9:38               ` Thomas Rast [this message]
2011-12-05 20:16                 ` Thomas Rast
2011-12-06  0:40                 ` Jeff King
2011-12-02 20:02             ` Eric Herman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201112051038.16423.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=eric@freesa.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=rene.scharfe@lsrfire.ath.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).