git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "erik elfström" <erik.elfstrom@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] clean: improve performance when removing lots of directories
Date: Tue, 7 Apr 2015 21:55:24 +0200	[thread overview]
Message-ID: <CAMpP7NYixn4491EdPTDX+RQFr3VZfuAoUWZ4JXuYg2rqp9XTeg@mail.gmail.com> (raw)
In-Reply-To: <CAPig+cQOLJcy-QuACrvd+XrCpP74k0SXxj0rBkNneG5Ovnf47Q@mail.gmail.com>

On Tue, Apr 7, 2015 at 12:10 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Apr 6, 2015 at 7:48 AM, Erik Elfström <erik.elfstrom@gmail.com> wrote:
>> Before this change, clean used resolve_gitlink_ref to check for the
>> presence of nested git repositories. This had the drawback of creating
>> a ref_cache entry for every directory that should potentially be
>> cleaned. The linear search through the ref_cache list caused a massive
>> performance hit for large number of directories.
>>
>> Teach clean.c:remove_dirs to use setup.c:is_git_directory
>> instead. is_git_directory will actually open HEAD and parse the HEAD
>> ref but this implies a nested git repository and should be rare when
>> cleaning.
>>
>> Using is_git_directory should give a more standardized check for what
>> is and what isn't a git repository but also gives a slight behavioral
>> change. We will now detect and respect bare and empty nested git
>> repositories (only init run). Update t7300 to reflect this.
>>
>> The time to clean an untracked directory containing 100000 sub
>> directories went from 61s to 1.7s after this change.
>
> Impressive.
>
>> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>
> It is customary for your sign-off to be last.
>
> More below...
>
>> ---
>> diff --git a/builtin/clean.c b/builtin/clean.c
>> index 98c103f..e951bd9 100644
>> --- a/builtin/clean.c
>> +++ b/builtin/clean.c
>> @@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
>>         return 0;
>>  }
>>
>> +static int is_git_repository(struct strbuf *path)
>> +{
>> +       int ret = 0;
>> +       if (is_git_directory(path->buf))
>> +               ret = 1;
>> +       else {
>> +               int orig_path_len = path->len;
>> +               if (path->buf[orig_path_len - 1] != '/')
>
> Minor: I don't know how others feel about it, but I always find it a
> bit disturbing to see a potential negative array access without a
> safety check that orig_path_len is not 0, either directly in the
> conditional or as a documenting assert().
>


I think I would prefer to accept empty input and return false rather
than assert. What to you think about:

static int is_git_repository(struct strbuf *path)
{
    int ret = 0;
    size_t orig_path_len = path->len;
    if (orig_path_len == 0)
        ret = 0;
    else if (is_git_directory(path->buf))
        ret = 1;
    else {
        if (path->buf[orig_path_len - 1] != '/')
            strbuf_addch(path, '/');
        strbuf_addstr(path, ".git");
        if (is_git_directory(path->buf))
            ret = 1;
        strbuf_setlen(path, orig_path_len);
    }

    return ret;
}


Also I borrowed this pattern from remove_dirs and it has the same
problem. Should I add something like this as a separate commit?

diff --git a/builtin/clean.c b/builtin/clean.c
index ccffd8a..88850e3 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -173,7 +173,8 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
        DIR *dir;
        struct strbuf quoted = STRBUF_INIT;
        struct dirent *e;
-       int res = 0, ret = 0, gone = 1, original_len = path->len, len;
+       int res = 0, ret = 0, gone = 1;
+       size_t original_len = path->len, len;
        struct string_list dels = STRING_LIST_INIT_DUP;

        *dir_gone = 1;
@@ -201,6 +202,7 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
                return res;
        }

+       assert(original_len > 0 && "expects non-empty path");
        if (path->buf[original_len - 1] != '/')
                strbuf_addch(path, '/');


>> +                       strbuf_addch(path, '/');
>> +               strbuf_addstr(path, ".git");
>> +               if (is_git_directory(path->buf))
>> +                       ret = 1;
>> +               strbuf_setlen(path, orig_path_len);
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>>  static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
>>                 int dry_run, int quiet, int *dir_gone)
>>  {

  reply	other threads:[~2015-04-07 19:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-06 11:48 [PATCH 0/3] Improving performance of git clean Erik Elfström
2015-04-06 11:48 ` [PATCH 1/3] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-06 22:06   ` Eric Sunshine
2015-04-07 19:27     ` erik elfström
2015-04-07 19:40   ` Eric Sunshine
2015-04-07 19:53     ` Torsten Bögershausen
2015-04-06 11:48 ` [PATCH 2/3] p7300: added performance tests for clean Erik Elfström
2015-04-06 20:40   ` Torsten Bögershausen
2015-04-06 22:09     ` Eric Sunshine
2015-04-07 19:35       ` erik elfström
2015-04-06 11:48 ` [PATCH 3/3] clean: improve performance when removing lots of directories Erik Elfström
2015-04-06 22:10   ` Eric Sunshine
2015-04-07 19:55     ` erik elfström [this message]
2015-04-08 21:29       ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMpP7NYixn4491EdPTDX+RQFr3VZfuAoUWZ4JXuYg2rqp9XTeg@mail.gmail.com \
    --to=erik.elfstrom@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).