From: "erik elfström" <erik.elfstrom@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] clean: improve performance when removing lots of directories
Date: Tue, 7 Apr 2015 21:55:24 +0200 [thread overview]
Message-ID: <CAMpP7NYixn4491EdPTDX+RQFr3VZfuAoUWZ4JXuYg2rqp9XTeg@mail.gmail.com> (raw)
In-Reply-To: <CAPig+cQOLJcy-QuACrvd+XrCpP74k0SXxj0rBkNneG5Ovnf47Q@mail.gmail.com>
On Tue, Apr 7, 2015 at 12:10 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Apr 6, 2015 at 7:48 AM, Erik Elfström <erik.elfstrom@gmail.com> wrote:
>> Before this change, clean used resolve_gitlink_ref to check for the
>> presence of nested git repositories. This had the drawback of creating
>> a ref_cache entry for every directory that should potentially be
>> cleaned. The linear search through the ref_cache list caused a massive
>> performance hit for large number of directories.
>>
>> Teach clean.c:remove_dirs to use setup.c:is_git_directory
>> instead. is_git_directory will actually open HEAD and parse the HEAD
>> ref but this implies a nested git repository and should be rare when
>> cleaning.
>>
>> Using is_git_directory should give a more standardized check for what
>> is and what isn't a git repository but also gives a slight behavioral
>> change. We will now detect and respect bare and empty nested git
>> repositories (only init run). Update t7300 to reflect this.
>>
>> The time to clean an untracked directory containing 100000 sub
>> directories went from 61s to 1.7s after this change.
>
> Impressive.
>
>> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>
> It is customary for your sign-off to be last.
>
> More below...
>
>> ---
>> diff --git a/builtin/clean.c b/builtin/clean.c
>> index 98c103f..e951bd9 100644
>> --- a/builtin/clean.c
>> +++ b/builtin/clean.c
>> @@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
>> return 0;
>> }
>>
>> +static int is_git_repository(struct strbuf *path)
>> +{
>> + int ret = 0;
>> + if (is_git_directory(path->buf))
>> + ret = 1;
>> + else {
>> + int orig_path_len = path->len;
>> + if (path->buf[orig_path_len - 1] != '/')
>
> Minor: I don't know how others feel about it, but I always find it a
> bit disturbing to see a potential negative array access without a
> safety check that orig_path_len is not 0, either directly in the
> conditional or as a documenting assert().
>
I think I would prefer to accept empty input and return false rather
than assert. What to you think about:
static int is_git_repository(struct strbuf *path)
{
int ret = 0;
size_t orig_path_len = path->len;
if (orig_path_len == 0)
ret = 0;
else if (is_git_directory(path->buf))
ret = 1;
else {
if (path->buf[orig_path_len - 1] != '/')
strbuf_addch(path, '/');
strbuf_addstr(path, ".git");
if (is_git_directory(path->buf))
ret = 1;
strbuf_setlen(path, orig_path_len);
}
return ret;
}
Also I borrowed this pattern from remove_dirs and it has the same
problem. Should I add something like this as a separate commit?
diff --git a/builtin/clean.c b/builtin/clean.c
index ccffd8a..88850e3 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -173,7 +173,8 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
DIR *dir;
struct strbuf quoted = STRBUF_INIT;
struct dirent *e;
- int res = 0, ret = 0, gone = 1, original_len = path->len, len;
+ int res = 0, ret = 0, gone = 1;
+ size_t original_len = path->len, len;
struct string_list dels = STRING_LIST_INIT_DUP;
*dir_gone = 1;
@@ -201,6 +202,7 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
return res;
}
+ assert(original_len > 0 && "expects non-empty path");
if (path->buf[original_len - 1] != '/')
strbuf_addch(path, '/');
>> + strbuf_addch(path, '/');
>> + strbuf_addstr(path, ".git");
>> + if (is_git_directory(path->buf))
>> + ret = 1;
>> + strbuf_setlen(path, orig_path_len);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
>> int dry_run, int quiet, int *dir_gone)
>> {
next prev parent reply other threads:[~2015-04-07 19:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-06 11:48 [PATCH 0/3] Improving performance of git clean Erik Elfström
2015-04-06 11:48 ` [PATCH 1/3] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-06 22:06 ` Eric Sunshine
2015-04-07 19:27 ` erik elfström
2015-04-07 19:40 ` Eric Sunshine
2015-04-07 19:53 ` Torsten Bögershausen
2015-04-06 11:48 ` [PATCH 2/3] p7300: added performance tests for clean Erik Elfström
2015-04-06 20:40 ` Torsten Bögershausen
2015-04-06 22:09 ` Eric Sunshine
2015-04-07 19:35 ` erik elfström
2015-04-06 11:48 ` [PATCH 3/3] clean: improve performance when removing lots of directories Erik Elfström
2015-04-06 22:10 ` Eric Sunshine
2015-04-07 19:55 ` erik elfström [this message]
2015-04-08 21:29 ` Eric Sunshine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMpP7NYixn4491EdPTDX+RQFr3VZfuAoUWZ4JXuYg2rqp9XTeg@mail.gmail.com \
--to=erik.elfstrom@gmail.com \
--cc=git@vger.kernel.org \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).