All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brandon Williams <bmwill@google.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, sbeller@google.com
Subject: Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
Date: Wed, 12 Jul 2017 11:49:39 -0700	[thread overview]
Message-ID: <20170712184939.GF65927@google.com> (raw)
In-Reply-To: <20170712002533.GD93855@aiede.mtv.corp.google.com>

On 07/11, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > Convert grep to use 'struct repository' which enables recursing into
> > submodules to be handled in-process.
> 
> \o/
> 
> This will be even nicer with the changes described at
> https://public-inbox.org/git/20170706202739.6056-1-sbeller@google.com/.
> Until then, I fear it will cause a regression --- see (*) below.
> 
> [...]
> >  Documentation/git-grep.txt |   7 -
> >  builtin/grep.c             | 390 +++++++++------------------------------------
> >  cache.h                    |   1 -
> >  git.c                      |   2 +-
> >  grep.c                     |  13 --
> >  grep.h                     |   1 -
> >  setup.c                    |  12 +-
> >  7 files changed, 81 insertions(+), 345 deletions(-)
> 
> Yay, tests still pass.
> 
> [..]
> > --- a/Documentation/git-grep.txt
> > +++ b/Documentation/git-grep.txt
> > @@ -95,13 +95,6 @@ OPTIONS
> >  	<tree> option the prefix of all submodule output will be the name of
> >  	the parent project's <tree> object.
> >  
> > ---parent-basename <basename>::
> > -	For internal use only.  In order to produce uniform output with the
> > -	--recurse-submodules option, this option can be used to provide the
> > -	basename of a parent's <tree> object to a submodule so the submodule
> > -	can prefix its output with the parent's name rather than the SHA1 of
> > -	the submodule.
> 
> Being able to get rid of this is a very nice change.
> 
> [...]
> > +++ b/builtin/grep.c
> [...]
> > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char *filename)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> >  
> > -	if (super_prefix)
> > -		strbuf_addstr(&buf, super_prefix);
> > -	strbuf_addstr(&buf, filename);
> > -
> >  	if (opt->relative && opt->prefix_length) {
> > -		char *name = strbuf_detach(&buf, NULL);
> > -		quote_path_relative(name, opt->prefix, &buf);
> > -		free(name);
> > +		quote_path_relative(filename, opt->prefix, &buf);
> > +	} else {
> > +		strbuf_addstr(&buf, filename);
> >  	}
> 
> style micronit: can avoid these braces since both branches are
> single-line.

Didn't realize that with all the deleted lines, I'll fix for the next
version.

> 
> [...]
> > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
> >  		exit(status);
> >  }
> >  
> > -static void compile_submodule_options(const struct grep_opt *opt,
> > -				      const char **argv,
> > -				      int cached, int untracked,
> > -				      int opt_exclude, int use_index,
> > -				      int pattern_type_arg)
> > -{
> [...]
> > -	/*
> > -	 * Limit number of threads for child process to use.
> > -	 * This is to prevent potential fork-bomb behavior of git-grep as each
> > -	 * submodule process has its own thread pool.
> > -	 */
> > -	argv_array_pushf(&submodule_options, "--threads=%d",
> > -			 (num_threads + 1) / 2);
> 
> Being able to get rid of this is another very nice change.
> 
> [...]
> > +	/* add objects to alternates */
> > +	add_to_alternates_memory(submodule.objectdir);
> 
> (*) This sets up a single in-memory object store with all the
> processed submodules.  Processed objects are never freed.
> This means that if I run a command like
> 
> 	git grep --recurse-submodules -e neverfound HEAD
> 
> in a project with many submodules then memory consumption scales in
> the same way as if the project were all one repository.  By contrast,
> without this patch, git is able to take advantage of the implicit
> free() when each child exits to limit its memory usage.
> 
> Worse, this increases the number of pack files git has to pay
> attention to the sum of the numbers of pack files in all the
> repositories processed so far.  A single object lookup can take
> O(number of packs * log(number of objects in each pack)) time.  That
> means performance is likely to suffer as the number of submodules
> increases (n^2 performance) even on systems with a lot of memory.
> 
> Once the object store is part of the repository struct and freeable,
> those problems go away and this patch becomes a no-brainer.
> 
> What should happen until then?  Should this go in "next" so we can get
> experience with it but with care not to let it graduate to "master"?

I agree that this is an issue and that we need to address by having
an object store per repository.  While that is being worked on (by
Stefan) I don't know how long it would take to have it be a reality.
So the question ends up being do we care more about the state of the
code and cleaning up a lot of 'hacks' that I introduced to get grep
working with submodules, or do we care about the performance more.  I
don't know which is the right answer but I'd personally like to see the
hacks I added to be removed sooner rather than later.  That and I think
that with the code in this sate it would make it easier to transition
once we have per-repository object-stores.

Either way I should add a NEEDSWORK comment here to indicate that it
should be removed once per-repo object-stores exist.

> 
> Aside from those two concerns, this patch looks very good from a quick
> skim, though I haven't reviewed it closely line-by-line.  Once we know
> how to go forward, I'm happy to look at it again.
> 
> Thanks,
> Jonathan

-- 
Brandon Williams

  reply	other threads:[~2017-07-12 18:49 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 22:04 [PATCH 0/3] Convert grep to recurse in-process Brandon Williams
2017-07-11 22:04 ` [PATCH 1/3] repo_read_index: don't discard the index Brandon Williams
2017-07-11 23:51   ` Jonathan Nieder
2017-07-12 17:27     ` Brandon Williams
2017-07-11 23:58   ` Stefan Beller
2017-07-12 17:23     ` Brandon Williams
2017-07-11 22:04 ` [PATCH 2/3] setup: have the_repository use the_index Brandon Williams
2017-07-12  0:00   ` Jonathan Nieder
2017-07-12  0:07     ` Stefan Beller
2017-07-12 17:30     ` Brandon Williams
2017-07-12  0:11   ` Junio C Hamano
2017-07-12 18:01     ` Brandon Williams
2017-07-12 20:38       ` Junio C Hamano
2017-07-12 21:33         ` Jonathan Nieder
2017-07-12 21:40           ` Junio C Hamano
2017-07-18 21:34             ` Junio C Hamano
2017-07-11 22:04 ` [PATCH 3/3] grep: recurse in-process using 'struct repository' Brandon Williams
2017-07-11 22:44   ` Jacob Keller
2017-07-12 18:54     ` Brandon Williams
2017-07-12  0:04   ` Stefan Beller
2017-07-12 18:56     ` Brandon Williams
2017-07-12  0:25   ` Jonathan Nieder
2017-07-12 18:49     ` Brandon Williams [this message]
2017-07-12  7:42 ` [PATCH 0/3] Convert grep to recurse in-process Jeff King
2017-07-12 18:06   ` Brandon Williams
2017-07-12 18:17     ` Jeff King
2017-07-12 18:24       ` Jonathan Nieder
2017-07-12 18:33         ` Jeff King
2017-07-12 18:09   ` Jonathan Nieder
2017-07-12 18:17     ` Stefan Beller
2017-07-12 18:27     ` Jeff King
2017-07-14 22:28 ` [PATCH v2 " Brandon Williams
2017-07-14 22:28   ` [PATCH v2 1/3] repo_read_index: don't discard the index Brandon Williams
2017-07-14 22:28   ` [PATCH v2 2/3] repository: have the_repository use the_index Brandon Williams
2017-07-14 22:28   ` [PATCH v2 3/3] grep: recurse in-process using 'struct repository' Brandon Williams
2017-07-18 19:05   ` [PATCH v3 00/10] Convert grep to recurse in-process Brandon Williams
2017-07-18 19:05     ` [PATCH v3 01/10] repo_read_index: don't discard the index Brandon Williams
2017-07-18 19:05     ` [PATCH v3 02/10] repository: have the_repository use the_index Brandon Williams
2017-07-18 19:05     ` [PATCH v3 03/10] cache.h: add GITMODULES_FILE macro Brandon Williams
2017-07-31 23:11       ` [PATCH] convert any hard coded .gitmodules file string to the MACRO Stefan Beller
2017-08-01 13:14         ` Jeff Hostetler
2017-08-01 17:35           ` Stefan Beller
2017-08-01 20:26             ` Junio C Hamano
2017-08-02 17:26               ` Brandon Williams
2017-08-02 17:46               ` Brandon Williams
2017-07-18 19:05     ` [PATCH v3 04/10] config: add config_from_gitmodules Brandon Williams
2017-07-18 19:05     ` [PATCH v3 05/10] submodule: remove submodule.fetchjobs from submodule-config parsing Brandon Williams
2017-07-18 19:05     ` [PATCH v3 06/10] submodule: remove fetch.recursesubmodules " Brandon Williams
2017-07-18 19:05     ` [PATCH v3 07/10] submodule: check for unstaged .gitmodules outside of config parsing Brandon Williams
2017-07-31 23:41       ` Stefan Beller
2017-08-02 17:41         ` Brandon Williams
2017-08-02 18:00           ` Brandon Williams
2017-07-18 19:05     ` [PATCH v3 08/10] submodule: check for unmerged " Brandon Williams
2017-07-18 19:05     ` [PATCH v3 09/10] submodule: merge repo_read_gitmodules and gitmodules_config Brandon Williams
2017-07-18 19:05     ` [PATCH v3 10/10] grep: recurse in-process using 'struct repository' Brandon Williams
2017-07-18 19:36     ` [PATCH v3 00/10] Convert grep to recurse in-process Junio C Hamano
2017-07-18 20:06       ` Brandon Williams
2017-08-02 19:49     ` [PATCH v4 " Brandon Williams
2017-08-02 19:49       ` [PATCH v4 01/10] repo_read_index: don't discard the index Brandon Williams
2017-08-02 19:49       ` [PATCH v4 02/10] repository: have the_repository use the_index Brandon Williams
2017-08-02 19:49       ` [PATCH v4 03/10] cache.h: add GITMODULES_FILE macro Brandon Williams
2017-08-02 19:49       ` [PATCH v4 04/10] config: add config_from_gitmodules Brandon Williams
2017-08-02 19:49       ` [PATCH v4 05/10] submodule: remove submodule.fetchjobs from submodule-config parsing Brandon Williams
2017-08-02 19:49       ` [PATCH v4 06/10] submodule: remove fetch.recursesubmodules " Brandon Williams
2017-08-02 19:49       ` [PATCH v4 07/10] submodule: check for unstaged .gitmodules outside of config parsing Brandon Williams
2017-08-02 19:49       ` [PATCH v4 08/10] submodule: check for unmerged " Brandon Williams
2017-08-02 19:49       ` [PATCH v4 09/10] submodule: merge repo_read_gitmodules and gitmodules_config Brandon Williams
2017-08-02 19:49       ` [PATCH v4 10/10] grep: recurse in-process using 'struct repository' Brandon Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170712184939.GF65927@google.com \
    --to=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.