All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jiang Xin <worldhello.net@gmail.com>
Cc: "Jiang Xin" <zhiyou.jx@alibaba-inc.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>,
	"Đoàn Trần Công Danh" <congdanhqx@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: Runaway sed memory use in test on older sed+glibc (was "Re: [PATCH v6 1/3] test: add helper functions for git-bundle")
Date: Thu, 27 May 2021 14:19:04 +0200	[thread overview]
Message-ID: <87tumol4tg.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210527115226.42539-1-zhiyou.jx@alibaba-inc.com>


On Thu, May 27 2021, Jiang Xin wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年5月27日周四
> 上午2:51写道:
>>
>>
>> On Mon, Jan 11 2021, Jiang Xin wrote:
>>
>> > From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>> >
>> > Move git-bundle related functions from t5510 to a library, and this
>> > lib
>> > will be shared with a new testcase t6020 which finds a known
>> > breakage of
>> > "git-bundle".
>> > [...]
>> > +
>> > +# Format the output of git commands to make a user-friendly and
>> > stable
>> > +# text.  We can easily prepare the expect text without having to
>> > worry
>> > +# about future changes of the commit ID and spaces of the output.
>> > +make_user_friendly_and_stable_output () {
>> > +     sed \
>> > +             -e "s/${A%${A#???????}}[0-9a-f]*/<COMMIT-A>/g" \
>> > +             -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" \
>> > +             -e "s/${C%${C#???????}}[0-9a-f]*/<COMMIT-C>/g" \
>> > +             -e "s/${D%${D#???????}}[0-9a-f]*/<COMMIT-D>/g" \
>> > +             -e "s/${E%${E#???????}}[0-9a-f]*/<COMMIT-E>/g" \
>> > +             -e "s/${F%${F#???????}}[0-9a-f]*/<COMMIT-F>/g" \
>> > +             -e "s/${G%${G#???????}}[0-9a-f]*/<COMMIT-G>/g" \
>> > +             -e "s/${H%${H#???????}}[0-9a-f]*/<COMMIT-H>/g" \
>> > +             -e "s/${I%${I#???????}}[0-9a-f]*/<COMMIT-I>/g" \
>> > +             -e "s/${J%${J#???????}}[0-9a-f]*/<COMMIT-J>/g" \
>> > +             -e "s/${K%${K#???????}}[0-9a-f]*/<COMMIT-K>/g" \
>> > +             -e "s/${L%${L#???????}}[0-9a-f]*/<COMMIT-L>/g" \
>> > +             -e "s/${M%${M#???????}}[0-9a-f]*/<COMMIT-M>/g" \
>> > +             -e "s/${N%${N#???????}}[0-9a-f]*/<COMMIT-N>/g" \
>> > +             -e "s/${O%${O#???????}}[0-9a-f]*/<COMMIT-O>/g" \
>> > +             -e "s/${P%${P#???????}}[0-9a-f]*/<COMMIT-P>/g" \
>> > +             -e "s/${TAG1%${TAG1#???????}}[0-9a-f]*/<TAG-1>/g" \
>> > +             -e "s/${TAG2%${TAG2#???????}}[0-9a-f]*/<TAG-2>/g" \
>> > +             -e "s/${TAG3%${TAG3#???????}}[0-9a-f]*/<TAG-3>/g" \
>> > +             -e "s/ *\$//"
>> > +}
>>
>> On one of the gcc farm boxes, a i386 box (gcc45) this fails because
>> sed
>> gets killed after >500MB of memory use (I was just eyeballing it in
>> htop) on the "reate bundle from special rev: main^!" test. This with
>> GNU
>> sed 4.2.2.
>>
>> I suspect this regex pattern creates some runaway behavior in sed
>> that's
>> since been fixed (or maybe it's the glibc regex engine?). The glibc is
>> 2.19-18+deb8u10:
>>
>>     + git bundle list-heads special-rev.bdl
>>     + make_user_friendly_and_stable_output
>>     + sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e
>> s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e
>> s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e
>> s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e
>> s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e
>> s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e
>> s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e
>> s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e
>> s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e
>> s/[0-9a-f]*/<TAG-3>/g -e s/ *$//
>>     sed: couldn't re-allocate memory
>
> I wrote a program on macOS to check memory footprint for sed and perl.
> See:
>
>     https://github.com/jiangxin/compare-sed-perl

Interesting use of Go for as a /usr/bin/time -v replacement :)

After changing your int64 to int32 and digging up how to cross-compile
Go I get similar results, it's because your test has actual short SHA-1s
in the "-e 's///g'"'s, but notice how in the trace I have it's
e.g. "s/[0-9a-f]*/<COMMIT-A>/g".

That's the problem, so that Go command won't reproduce it. Anyway,
changing the test to emit to "input" first and running this shows it:
    
    avar@gcc45:/run/user/1632/git/t/trash directory.t6020-bundle-misc$ /usr/bin/time -v sed -e 's/[0-9a-f]*/<COMMIT-A>/g' -e 's/[0-9a-f]*/<COMMIT-B>/g' -e 's/[0-9a-f]*/<COMMIT-C>/g' -e 's/[0-9a-f]*/<COMMIT-D>/g' -e 's/[0-9a-f]*/<COMMIT-E>/g' -e 's/[0-9a-f]*/<COMMIT-F>/g' -e 's/[0-9a-f]*/<COMMIT-G>/g' -e 's/[0-9a-f]*/<COMMIT-H>/g' -e 's/[0-9a-f]*/<COMMIT-I>/g' -e 's/[0-9a-f]*/<COMMIT-J>/g' -e 's/[0-9a-f]*/<COMMIT-K>/g' -e 's/[0-9a-f]*/<COMMIT-L>/g' -e 's/[0-9a-f]*/<COMMIT-M>/g' -e 's/[0-9a-f]*/<COMMIT-N>/g' -e 's/[0-9a-f]*/<COMMIT-O>/g' -e 's/[0-9a-f]*/<COMMIT-P>/g' -e 's/[0-9a-f]*/<TAG-1>/g' -e 's/[0-9a-f]*/<TAG-2>/g' -e 's/[0-9a-f]*/<TAG-3>/g' -e 's/ *$//' <input
    sed: couldn't re-allocate memory
    Command exited with non-zero status 4
            Command being timed: "sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e s/[0-9a-f]*/<TAG-3>/g -e s/ *$//"
            User time (seconds): 130.00
            System time (seconds): 2.42
            Percent of CPU this job got: 100%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.41
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1030968
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 257333
            Voluntary context switches: 1
            Involuntary context switches: 12578
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 4

But no, the issue as it turns out is not Perl v.s. Sed, it's that
there's some bug in the shellscript / tooling version (happens with both
dash 0.5.7-4 and bash 4.3-11+deb8u2 on that box) where those expansions
like ${A%${A#??????0?}} resolve to nothing.

So if we make that:

        cat >input &&
        cat input >&2 &&
        sed -e "s/${A%${A#??????0?}}[0-9a-f]*/<COMMIT-A>/g" <input >input.tmp && mv input.tmp input &&
        cat input >&2 &&
        sed -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" <input >input.tmp && mv input.tmp input &&
        cat input >&2 &&

We get things like:
    
    + sed -e s/[0-9a-f]*/<COMMIT-A>/g
    + mv input.tmp input
    + cat input
    <COMMIT-A> <COMMIT-A>r<COMMIT-A>s<COMMIT-A>/<COMMIT-A>h<COMMIT-A>s<COMMIT-A>/<COMMIT-A>m<COMMIT-A>i<COMMIT-A>n<COMMIT-A>
    + sed -e s/[0-9a-f]*/<COMMIT-B>/g
    + mv input.tmp input
    + cat input
    <COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B> <COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>r<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>h<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>m<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>i<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>n<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>
    [...]

etc. I.e. it's the sed expression itself that's the issue. I.e. you
should be able to reproduce this locally with something like:

    echo 0 | sed -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g'

If not just copy the -e a few more times.

Anyway, looking at this whole test file with fresh eyes this pattern
seems very strange. You duplicated most of test_commit with this
test_commit_setvar. It's a bit more verbosity but why not just use:

    test_commit ...
    A=$(git rev-parse HEAD)

Or teach test_commit a --rev-parse option or something and:

    A=$(test_commit ...)

This make_user_friendly_and_stable_output then actually loses
information, e.g. sometimes the bundle output you're testing emits
trailing spaces, but the normalization function overzelously trims that.

I think this whole thing would be much simpler with the above and then
something like:
    
    @@ -146,7 +126,8 @@ test_expect_success 'setup' '
     
            # branch main: merge commit I & J
            git checkout main &&
    -       test_commit_setvar --merge I topic/1 "Merge commit I" &&
    +       git merge --no-edit --no-ff -m"Merge commit I" topic/1 &&
    +       I=$(git rev-parse HEAD) &&
            test_commit_setvar --merge J refs/pull/2/head "Merge commit J" &&
     
            # branch main: commit K
    @@ -172,18 +153,18 @@ test_expect_success 'create bundle from special rev: main^!' '
     
            git bundle list-heads special-rev.bdl |
                    make_user_friendly_and_stable_output >actual &&
    -       cat >expect <<-\EOF &&
    -       <COMMIT-P> refs/heads/main
    +       cat >expect <<-EOF &&
    +       $P refs/heads/main
            EOF
            test_cmp expect actual &&

Or just add a --merge option to test_commit itself.

  reply	other threads:[~2021-05-27 12:49 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-03  9:54 [PATCH] bundle: arguments can be read from stdin Jiang Xin
2021-01-04 23:41 ` Junio C Hamano
2021-01-05 16:30   ` [PATCH v2 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-05 16:30   ` [PATCH v2 2/2] bundle: arguments can be read from stdin Jiang Xin
2021-01-07 13:50   ` [PATCH v3 0/2] improvements for git-bundle Jiang Xin
2021-01-07 13:50   ` [PATCH v3 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-07 15:37     ` Đoàn Trần Công Danh
2021-01-08 13:14       ` Jiang Xin
2021-01-08 14:45       ` [PATCH v4 0/2] Improvements for git-bundle Jiang Xin
2021-01-08 14:45       ` [PATCH v4 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-09  2:10         ` Junio C Hamano
2021-01-09 13:32           ` Jiang Xin
2021-01-09 22:02             ` Junio C Hamano
2021-01-10 14:30               ` [PATCH v5 0/3] improvements for git-bundle Jiang Xin
2021-01-10 14:30               ` [PATCH v5 1/3] test: add helper functions " Jiang Xin
2021-01-11 20:09                 ` Junio C Hamano
2021-01-12  2:27                   ` [PATCH v6 0/3] improvements " Jiang Xin
2021-01-12  2:27                   ` [PATCH v6 1/3] test: add helper functions " Jiang Xin
2021-05-26 18:49                     ` Runaway sed memory use in test on older sed+glibc (was "Re: [PATCH v6 1/3] test: add helper functions for git-bundle") Ævar Arnfjörð Bjarmason
2021-05-27 11:52                       ` Jiang Xin
2021-05-27 12:19                         ` Ævar Arnfjörð Bjarmason [this message]
2021-05-27 13:48                           ` Jeff King
2021-05-27 19:19                           ` Felipe Contreras
2021-06-01  9:45                             ` Jiang Xin
2021-06-01  9:42                           ` Jiang Xin
2021-06-01 11:50                             ` Ævar Arnfjörð Bjarmason
2021-06-01 13:20                               ` Jiang Xin
2021-06-01 14:49                                 ` [PATCH 1/2] t6020: fix bash incompatible issue Jiang Xin
2021-06-01 14:49                                 ` [PATCH 2/2] t6020: do not mangle trailing spaces in output Jiang Xin
2021-06-05 17:02                                   ` Ævar Arnfjörð Bjarmason
2021-06-12  5:07                                     ` [PATCH v2 0/4] Fixed t6020 bash compatible issue and fixed wrong sideband suffix issue Jiang Xin
2021-06-14  4:10                                       ` Junio C Hamano
2021-06-15  3:11                                         ` Jiang Xin
2021-06-17  3:14                                           ` [PATCH v3] t6020: fix incompatible parameter expansion Jiang Xin
2021-06-21  8:41                                             ` Ævar Arnfjörð Bjarmason
2021-06-12  5:07                                     ` [PATCH v2 1/4] t6020: fix bash incompatible issue Jiang Xin
2021-06-12  5:07                                     ` [PATCH v2 2/4] test: refactor create_commits_in() for t5411 and t5548 Jiang Xin
2021-06-12  5:07                                     ` [PATCH v2 3/4] sideband: append suffix for message whose CR in next pktline Jiang Xin
2021-06-13  7:47                                       ` Ævar Arnfjörð Bjarmason
2021-06-14  3:50                                       ` Junio C Hamano
2021-06-14 11:51                                         ` Jiang Xin
2021-06-15  1:17                                           ` Junio C Hamano
2021-06-15  1:47                                             ` Jiang Xin
2021-06-15  2:11                                               ` Nicolas Pitre
2021-06-15  3:04                                                 ` Jiang Xin
2021-06-15  3:26                                                   ` Nicolas Pitre
2021-06-15  4:46                                                     ` Junio C Hamano
2021-06-15  7:17                                                       ` Jiang Xin
2021-06-15 14:46                                                       ` Nicolas Pitre
2021-06-12  5:07                                     ` [PATCH v2 4/4] test: compare raw output, not mangle tabs and spaces Jiang Xin
2021-01-12  2:27                   ` [PATCH v6 2/3] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-12  2:27                   ` [PATCH v6 3/3] bundle: arguments can be read from stdin Jiang Xin
2021-01-10 14:30               ` [PATCH v5 2/3] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-11 20:12                 ` Junio C Hamano
2021-01-10 14:30               ` [PATCH v5 3/3] bundle: arguments can be read from stdin Jiang Xin
2021-01-09 15:09           ` [PATCH v4 1/2] bundle: lost objects when removing duplicate pendings Jiang Xin
2021-01-09 22:02             ` Junio C Hamano
2021-01-08 14:45       ` [PATCH v4 2/2] bundle: arguments can be read from stdin Jiang Xin
2021-01-09  2:18         ` Junio C Hamano
2021-01-07 13:50   ` [PATCH v3 " Jiang Xin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tumol4tg.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=worldhello.net@gmail.com \
    --cc=zhiyou.jx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.