From: Brandon Williams <bmwill@google.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0
Date: Mon, 24 Apr 2017 14:34:28 -0700 [thread overview]
Message-ID: <20170424213428.GA105623@google.com> (raw)
In-Reply-To: <20170424211249.28553-1-avarab@gmail.com>
On 04/24, Ævar Arnfjörð Bjarmason wrote:
> Russ Cox just published an article about how various glob()
> implementations suffer from pathological performance when fed certain
> pathological patterns like "a*a*a*a*b" given a file like "aaaaaaa...":
> https://research.swtch.com/glob
>
> I was curious to see if this impacted git. It turns out it does. This
> used to be a per-platform issue with git, since globbing was
> implemented via fnmatch() by default before v1.8.4, and support for
> using the OS fnmatch() was removed entirely in v2.0.0.
>
> This performance test shows the regression:
>
> $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 v2.12.0 p0100-globbing.sh
> [...]
> Test v1.9.5 v2.0.0 v2.12.0
> ------------------------------------------------------------------------------------------------------------------------------
> [...]
> 0100.7: fileglob((a*)^nb) against file (a^100).t; n = 1 0.01(0.00+0.00) 0.00(0.00+0.00) -100.0% 0.01(0.00+0.00) +0.0%
> 0100.8: fileglob((a*)^nb) against file (a^100).t; n = 2 0.01(0.00+0.00) 0.00(0.00+0.00) -100.0% 0.01(0.00+0.00) +0.0%
> 0100.9: fileglob((a*)^nb) against file (a^100).t; n = 3 0.00(0.00+0.00) 0.00(0.00+0.00) = 0.01(0.00+0.00) +inf
> 0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4 0.00(0.00+0.00) 0.01(0.01+0.00) +inf 0.02(0.02+0.00) +inf
> 0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5 0.00(0.00+0.00) 0.20(0.19+0.00) +inf 0.24(0.23+0.00) +inf
> 0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6 0.00(0.00+0.00) 3.03(3.00+0.00) +inf 3.08(3.05+0.00) +inf
>
> And here's a one-liner to do the same:
>
> $ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 'print "a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 'a*a*a*a*a*a*a*b'))
>
> Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3
> seconds on my system, 40 seconds with 7 etc.
>
> I don't think this is something we need to worry much about, if you
> have a file like this and feed Git insane patterns you probably
> deserve what you get.
>
> The real concern is if we have behavior like this and ever e.g. expose
> globbing over the network, e.g. in some future upload-pack protocol.
>
> There are probably some web-based programs built on top of git that
> are vulnerable to DoS attacks as a result of this, e.g. if they take
> user-supplied globs and feed them to ls-files.
I was taking a look at wildmatch a few months ago and have an unfinished
patch to do some cleanup there. I noticed this was inefficient but
didn't expect those kinds of numbers. I wonder how difficult it would
be to rewrite it so that we don't have this issue.
--
Brandon Williams
next prev parent reply other threads:[~2017-04-24 21:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 21:12 [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0 Ævar Arnfjörð Bjarmason
2017-04-24 21:12 ` [PATCH 1/2] perf: add function to setup a fresh test repo Ævar Arnfjörð Bjarmason
2017-04-24 21:12 ` [PATCH 2/2] perf: add test showing exponential growth in path globbing Ævar Arnfjörð Bjarmason
2017-04-24 21:34 ` Brandon Williams [this message]
2017-04-24 22:45 ` [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0 Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170424213428.GA105623@google.com \
--to=bmwill@google.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).