git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brandon Williams <bmwill@google.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0
Date: Mon, 24 Apr 2017 14:34:28 -0700	[thread overview]
Message-ID: <20170424213428.GA105623@google.com> (raw)
In-Reply-To: <20170424211249.28553-1-avarab@gmail.com>

On 04/24, Ævar Arnfjörð Bjarmason wrote:
> Russ Cox just published an article about how various glob()
> implementations suffer from pathological performance when fed certain
> pathological patterns like "a*a*a*a*b" given a file like "aaaaaaa...":
> https://research.swtch.com/glob
> 
> I was curious to see if this impacted git. It turns out it does. This
> used to be a per-platform issue with git, since globbing was
> implemented via fnmatch() by default before v1.8.4, and support for
> using the OS fnmatch() was removed entirely in v2.0.0.
> 
> This performance test shows the regression:
> 
>     $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 v2.12.0 p0100-globbing.sh
>     [...]
>     Test                                                       v1.9.5            v2.0.0                    v2.12.0
>     ------------------------------------------------------------------------------------------------------------------------------
>     [...]
>     0100.7: fileglob((a*)^nb) against file (a^100).t; n = 1    0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
>     0100.8: fileglob((a*)^nb) against file (a^100).t; n = 2    0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
>     0100.9: fileglob((a*)^nb) against file (a^100).t; n = 3    0.00(0.00+0.00)   0.00(0.00+0.00) =         0.01(0.00+0.00) +inf
>     0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4   0.00(0.00+0.00)   0.01(0.01+0.00) +inf      0.02(0.02+0.00) +inf
>     0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5   0.00(0.00+0.00)   0.20(0.19+0.00) +inf      0.24(0.23+0.00) +inf
>     0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6   0.00(0.00+0.00)   3.03(3.00+0.00) +inf      3.08(3.05+0.00) +inf
> 
> And here's a one-liner to do the same:
> 
>     $ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 'print "a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 'a*a*a*a*a*a*a*b'))
> 
> Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3
> seconds on my system, 40 seconds with 7 etc.
> 
> I don't think this is something we need to worry much about, if you
> have a file like this and feed Git insane patterns you probably
> deserve what you get.
> 
> The real concern is if we have behavior like this and ever e.g. expose
> globbing over the network, e.g. in some future upload-pack protocol.
> 
> There are probably some web-based programs built on top of git that
> are vulnerable to DoS attacks as a result of this, e.g. if they take
> user-supplied globs and feed them to ls-files.

I was taking a look at wildmatch a few months ago and have an unfinished
patch to do some cleanup there.  I noticed this was inefficient but
didn't expect those kinds of numbers.  I wonder how difficult it would
be to rewrite it so that we don't have this issue.

-- 
Brandon Williams

  parent reply	other threads:[~2017-04-24 21:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 21:12 [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0 Ævar Arnfjörð Bjarmason
2017-04-24 21:12 ` [PATCH 1/2] perf: add function to setup a fresh test repo Ævar Arnfjörð Bjarmason
2017-04-24 21:12 ` [PATCH 2/2] perf: add test showing exponential growth in path globbing Ævar Arnfjörð Bjarmason
2017-04-24 21:34 ` Brandon Williams [this message]
2017-04-24 22:45   ` [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0 Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170424213428.GA105623@google.com \
    --to=bmwill@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).