From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jiang Xin <worldhello.net@gmail.com>
Cc: "Jiang Xin" <zhiyou.jx@alibaba-inc.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Git List" <git@vger.kernel.org>,
"Đoàn Trần Công Danh" <congdanhqx@gmail.com>,
"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: Runaway sed memory use in test on older sed+glibc (was "Re: [PATCH v6 1/3] test: add helper functions for git-bundle")
Date: Thu, 27 May 2021 14:19:04 +0200 [thread overview]
Message-ID: <87tumol4tg.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210527115226.42539-1-zhiyou.jx@alibaba-inc.com>
On Thu, May 27 2021, Jiang Xin wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年5月27日周四
> 上午2:51写道:
>>
>>
>> On Mon, Jan 11 2021, Jiang Xin wrote:
>>
>> > From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>> >
>> > Move git-bundle related functions from t5510 to a library, and this
>> > lib
>> > will be shared with a new testcase t6020 which finds a known
>> > breakage of
>> > "git-bundle".
>> > [...]
>> > +
>> > +# Format the output of git commands to make a user-friendly and
>> > stable
>> > +# text. We can easily prepare the expect text without having to
>> > worry
>> > +# about future changes of the commit ID and spaces of the output.
>> > +make_user_friendly_and_stable_output () {
>> > + sed \
>> > + -e "s/${A%${A#???????}}[0-9a-f]*/<COMMIT-A>/g" \
>> > + -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" \
>> > + -e "s/${C%${C#???????}}[0-9a-f]*/<COMMIT-C>/g" \
>> > + -e "s/${D%${D#???????}}[0-9a-f]*/<COMMIT-D>/g" \
>> > + -e "s/${E%${E#???????}}[0-9a-f]*/<COMMIT-E>/g" \
>> > + -e "s/${F%${F#???????}}[0-9a-f]*/<COMMIT-F>/g" \
>> > + -e "s/${G%${G#???????}}[0-9a-f]*/<COMMIT-G>/g" \
>> > + -e "s/${H%${H#???????}}[0-9a-f]*/<COMMIT-H>/g" \
>> > + -e "s/${I%${I#???????}}[0-9a-f]*/<COMMIT-I>/g" \
>> > + -e "s/${J%${J#???????}}[0-9a-f]*/<COMMIT-J>/g" \
>> > + -e "s/${K%${K#???????}}[0-9a-f]*/<COMMIT-K>/g" \
>> > + -e "s/${L%${L#???????}}[0-9a-f]*/<COMMIT-L>/g" \
>> > + -e "s/${M%${M#???????}}[0-9a-f]*/<COMMIT-M>/g" \
>> > + -e "s/${N%${N#???????}}[0-9a-f]*/<COMMIT-N>/g" \
>> > + -e "s/${O%${O#???????}}[0-9a-f]*/<COMMIT-O>/g" \
>> > + -e "s/${P%${P#???????}}[0-9a-f]*/<COMMIT-P>/g" \
>> > + -e "s/${TAG1%${TAG1#???????}}[0-9a-f]*/<TAG-1>/g" \
>> > + -e "s/${TAG2%${TAG2#???????}}[0-9a-f]*/<TAG-2>/g" \
>> > + -e "s/${TAG3%${TAG3#???????}}[0-9a-f]*/<TAG-3>/g" \
>> > + -e "s/ *\$//"
>> > +}
>>
>> On one of the gcc farm boxes, a i386 box (gcc45) this fails because
>> sed
>> gets killed after >500MB of memory use (I was just eyeballing it in
>> htop) on the "reate bundle from special rev: main^!" test. This with
>> GNU
>> sed 4.2.2.
>>
>> I suspect this regex pattern creates some runaway behavior in sed
>> that's
>> since been fixed (or maybe it's the glibc regex engine?). The glibc is
>> 2.19-18+deb8u10:
>>
>> + git bundle list-heads special-rev.bdl
>> + make_user_friendly_and_stable_output
>> + sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e
>> s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e
>> s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e
>> s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e
>> s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e
>> s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e
>> s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e
>> s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e
>> s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e
>> s/[0-9a-f]*/<TAG-3>/g -e s/ *$//
>> sed: couldn't re-allocate memory
>
> I wrote a program on macOS to check memory footprint for sed and perl.
> See:
>
> https://github.com/jiangxin/compare-sed-perl
Interesting use of Go for as a /usr/bin/time -v replacement :)
After changing your int64 to int32 and digging up how to cross-compile
Go I get similar results, it's because your test has actual short SHA-1s
in the "-e 's///g'"'s, but notice how in the trace I have it's
e.g. "s/[0-9a-f]*/<COMMIT-A>/g".
That's the problem, so that Go command won't reproduce it. Anyway,
changing the test to emit to "input" first and running this shows it:
avar@gcc45:/run/user/1632/git/t/trash directory.t6020-bundle-misc$ /usr/bin/time -v sed -e 's/[0-9a-f]*/<COMMIT-A>/g' -e 's/[0-9a-f]*/<COMMIT-B>/g' -e 's/[0-9a-f]*/<COMMIT-C>/g' -e 's/[0-9a-f]*/<COMMIT-D>/g' -e 's/[0-9a-f]*/<COMMIT-E>/g' -e 's/[0-9a-f]*/<COMMIT-F>/g' -e 's/[0-9a-f]*/<COMMIT-G>/g' -e 's/[0-9a-f]*/<COMMIT-H>/g' -e 's/[0-9a-f]*/<COMMIT-I>/g' -e 's/[0-9a-f]*/<COMMIT-J>/g' -e 's/[0-9a-f]*/<COMMIT-K>/g' -e 's/[0-9a-f]*/<COMMIT-L>/g' -e 's/[0-9a-f]*/<COMMIT-M>/g' -e 's/[0-9a-f]*/<COMMIT-N>/g' -e 's/[0-9a-f]*/<COMMIT-O>/g' -e 's/[0-9a-f]*/<COMMIT-P>/g' -e 's/[0-9a-f]*/<TAG-1>/g' -e 's/[0-9a-f]*/<TAG-2>/g' -e 's/[0-9a-f]*/<TAG-3>/g' -e 's/ *$//' <input
sed: couldn't re-allocate memory
Command exited with non-zero status 4
Command being timed: "sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e s/[0-9a-f]*/<TAG-3>/g -e s/ *$//"
User time (seconds): 130.00
System time (seconds): 2.42
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.41
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1030968
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 257333
Voluntary context switches: 1
Involuntary context switches: 12578
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 4
But no, the issue as it turns out is not Perl v.s. Sed, it's that
there's some bug in the shellscript / tooling version (happens with both
dash 0.5.7-4 and bash 4.3-11+deb8u2 on that box) where those expansions
like ${A%${A#??????0?}} resolve to nothing.
So if we make that:
cat >input &&
cat input >&2 &&
sed -e "s/${A%${A#??????0?}}[0-9a-f]*/<COMMIT-A>/g" <input >input.tmp && mv input.tmp input &&
cat input >&2 &&
sed -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" <input >input.tmp && mv input.tmp input &&
cat input >&2 &&
We get things like:
+ sed -e s/[0-9a-f]*/<COMMIT-A>/g
+ mv input.tmp input
+ cat input
<COMMIT-A> <COMMIT-A>r<COMMIT-A>s<COMMIT-A>/<COMMIT-A>h<COMMIT-A>s<COMMIT-A>/<COMMIT-A>m<COMMIT-A>i<COMMIT-A>n<COMMIT-A>
+ sed -e s/[0-9a-f]*/<COMMIT-B>/g
+ mv input.tmp input
+ cat input
<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B> <COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>r<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>h<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>m<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>i<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>n<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>
[...]
etc. I.e. it's the sed expression itself that's the issue. I.e. you
should be able to reproduce this locally with something like:
echo 0 | sed -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g'
If not just copy the -e a few more times.
Anyway, looking at this whole test file with fresh eyes this pattern
seems very strange. You duplicated most of test_commit with this
test_commit_setvar. It's a bit more verbosity but why not just use:
test_commit ...
A=$(git rev-parse HEAD)
Or teach test_commit a --rev-parse option or something and:
A=$(test_commit ...)
This make_user_friendly_and_stable_output then actually loses
information, e.g. sometimes the bundle output you're testing emits
trailing spaces, but the normalization function overzelously trims that.
I think this whole thing would be much simpler with the above and then
something like:
@@ -146,7 +126,8 @@ test_expect_success 'setup' '
# branch main: merge commit I & J
git checkout main &&
- test_commit_setvar --merge I topic/1 "Merge commit I" &&
+ git merge --no-edit --no-ff -m"Merge commit I" topic/1 &&
+ I=$(git rev-parse HEAD) &&
test_commit_setvar --merge J refs/pull/2/head "Merge commit J" &&
# branch main: commit K
@@ -172,18 +153,18 @@ test_expect_success 'create bundle from special rev: main^!' '
git bundle list-heads special-rev.bdl |
make_user_friendly_and_stable_output >actual &&
- cat >expect <<-\EOF &&
- <COMMIT-P> refs/heads/main
+ cat >expect <<-EOF &&
+ $P refs/heads/main
EOF
test_cmp expect actual &&
Or just add a --merge option to test_commit itself.
next prev parent reply other threads:[~2021-05-27 12:49 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-03 9:54 [PATCH] bundle: arguments can be read from stdin Jiang Xin
2021-01-04 23:41 ` Junio C Hamano
2021-01-05 16:30 ` [PATCH v2 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-05 16:30 ` [PATCH v2 2/2] bundle: arguments can be read from stdin Jiang Xin
2021-01-07 13:50 ` [PATCH v3 0/2] improvements for git-bundle Jiang Xin
2021-01-07 13:50 ` [PATCH v3 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-07 15:37 ` Đoàn Trần Công Danh
2021-01-08 13:14 ` Jiang Xin
2021-01-08 14:45 ` [PATCH v4 0/2] Improvements for git-bundle Jiang Xin
2021-01-08 14:45 ` [PATCH v4 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-09 2:10 ` Junio C Hamano
2021-01-09 13:32 ` Jiang Xin
2021-01-09 22:02 ` Junio C Hamano
2021-01-10 14:30 ` [PATCH v5 0/3] improvements for git-bundle Jiang Xin
2021-01-10 14:30 ` [PATCH v5 1/3] test: add helper functions " Jiang Xin
2021-01-11 20:09 ` Junio C Hamano
2021-01-12 2:27 ` [PATCH v6 0/3] improvements " Jiang Xin
2021-01-12 2:27 ` [PATCH v6 1/3] test: add helper functions " Jiang Xin
2021-05-26 18:49 ` Runaway sed memory use in test on older sed+glibc (was "Re: [PATCH v6 1/3] test: add helper functions for git-bundle") Ævar Arnfjörð Bjarmason
2021-05-27 11:52 ` Jiang Xin
2021-05-27 12:19 ` Ævar Arnfjörð Bjarmason [this message]
2021-05-27 13:48 ` Jeff King
2021-05-27 19:19 ` Felipe Contreras
2021-06-01 9:45 ` Jiang Xin
2021-06-01 9:42 ` Jiang Xin
2021-06-01 11:50 ` Ævar Arnfjörð Bjarmason
2021-06-01 13:20 ` Jiang Xin
2021-06-01 14:49 ` [PATCH 1/2] t6020: fix bash incompatible issue Jiang Xin
2021-06-01 14:49 ` [PATCH 2/2] t6020: do not mangle trailing spaces in output Jiang Xin
2021-06-05 17:02 ` Ævar Arnfjörð Bjarmason
2021-06-12 5:07 ` [PATCH v2 0/4] Fixed t6020 bash compatible issue and fixed wrong sideband suffix issue Jiang Xin
2021-06-14 4:10 ` Junio C Hamano
2021-06-15 3:11 ` Jiang Xin
2021-06-17 3:14 ` [PATCH v3] t6020: fix incompatible parameter expansion Jiang Xin
2021-06-21 8:41 ` Ævar Arnfjörð Bjarmason
2021-06-12 5:07 ` [PATCH v2 1/4] t6020: fix bash incompatible issue Jiang Xin
2021-06-12 5:07 ` [PATCH v2 2/4] test: refactor create_commits_in() for t5411 and t5548 Jiang Xin
2021-06-12 5:07 ` [PATCH v2 3/4] sideband: append suffix for message whose CR in next pktline Jiang Xin
2021-06-13 7:47 ` Ævar Arnfjörð Bjarmason
2021-06-14 3:50 ` Junio C Hamano
2021-06-14 11:51 ` Jiang Xin
2021-06-15 1:17 ` Junio C Hamano
2021-06-15 1:47 ` Jiang Xin
2021-06-15 2:11 ` Nicolas Pitre
2021-06-15 3:04 ` Jiang Xin
2021-06-15 3:26 ` Nicolas Pitre
2021-06-15 4:46 ` Junio C Hamano
2021-06-15 7:17 ` Jiang Xin
2021-06-15 14:46 ` Nicolas Pitre
2021-06-12 5:07 ` [PATCH v2 4/4] test: compare raw output, not mangle tabs and spaces Jiang Xin
2021-01-12 2:27 ` [PATCH v6 2/3] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-12 2:27 ` [PATCH v6 3/3] bundle: arguments can be read from stdin Jiang Xin
2021-01-10 14:30 ` [PATCH v5 2/3] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-11 20:12 ` Junio C Hamano
2021-01-10 14:30 ` [PATCH v5 3/3] bundle: arguments can be read from stdin Jiang Xin
2021-01-09 15:09 ` [PATCH v4 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-09 22:02 ` Junio C Hamano
2021-01-08 14:45 ` [PATCH v4 2/2] bundle: arguments can be read from stdin Jiang Xin
2021-01-09 2:18 ` Junio C Hamano
2021-01-07 13:50 ` [PATCH v3 " Jiang Xin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tumol4tg.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=congdanhqx@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=worldhello.net@gmail.com \
--cc=zhiyou.jx@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).