From: Duy Nguyen <pclouds@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: Ben Peart <benpeart@microsoft.com>,
Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()
Date: Fri, 2 Nov 2018 16:49:59 +0100 [thread overview]
Message-ID: <CACsJy8CVPSe8TWYMrK9MiRCaG36qyWfd42cEPo5844XWuTmqew@mail.gmail.com> (raw)
In-Reply-To: <20181102133050.10756-1-peartben@gmail.com>
On Fri, Nov 2, 2018 at 2:32 PM Ben Peart <peartben@gmail.com> wrote:
>
> From: Ben Peart <benpeart@microsoft.com>
>
> During an "add", a call is made to run_diff_files() which calls
> check_remove() for each index-entry. The preload_index() code distributes
> some of the costs across multiple threads.
Instead of doing this site by site. How about we make read_cache()
always do multithread preload?
The only downside I see is preload may actually harm when there are
too few cache entries (but more than 500), but this needs to be
verified. If the penalty is small enough, I think we could live with
it since everything is fast when you have few entries anyway.
But if that's not true, we could add a threshold to activate preload.
Something like "if you have 50k files or more, then activate preload"
would do. I notice THREAD_COST in preload code, but I don't think it's
the same thing.
>
> Because the files checked are restricted to pathspec, adding individual
> files makes no measurable impact but on a Windows repo with ~200K files,
> 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/fc4830b545
> Checkout: git fetch https://github.com/benpeart/git add-preload-index-v1 && git checkout fc4830b545
>
> builtin/add.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index ad49806ebf..f65c172299 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char *prefix)
> return 0;
> }
>
> - if (read_cache() < 0)
> - die(_("index file corrupt"));
> -
> - die_in_unpopulated_submodule(&the_index, prefix);
> -
> /*
> * Check the "pathspec '%s' did not match any files" block
> * below before enabling new magic.
> @@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
> PATHSPEC_SYMLINK_LEADING_PATH,
> prefix, argv);
>
> + if (read_cache_preload(&pathspec) < 0)
> + die(_("index file corrupt"));
> +
> + die_in_unpopulated_submodule(&the_index, prefix);
> die_path_inside_submodule(&the_index, &pathspec);
>
> if (add_new_files) {
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> --
> 2.18.0.windows.1
>
--
Duy
next prev parent reply other threads:[~2018-11-02 15:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-02 13:30 [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload() Ben Peart
2018-11-02 15:23 ` Junio C Hamano
2018-11-02 16:14 ` Ben Peart
2018-11-02 15:49 ` Duy Nguyen [this message]
2018-11-03 0:38 ` Junio C Hamano
2018-11-03 4:47 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACsJy8CVPSe8TWYMrK9MiRCaG36qyWfd42cEPo5844XWuTmqew@mail.gmail.com \
--to=pclouds@gmail.com \
--cc=benpeart@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peartben@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).