From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: phillip.wood@dunelm.org.uk
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, Brian Norris <briannorris@chromium.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] t3429: try to protect against a potential racy todo file problem
Date: Mon, 25 Nov 2019 16:15:17 +0100 [thread overview]
Message-ID: <20191125151517.GE23183@szeder.dev> (raw)
In-Reply-To: <5a43a071-a3c2-770e-bca4-3e73aff96e48@gmail.com>
On Mon, Nov 25, 2019 at 02:43:07PM +0000, Phillip Wood wrote:
> On 25/11/2019 13:18, SZEDER Gábor wrote:
> >On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
> >>To notice a changed todo file the sequencer stores the file's stat
> >>data in its 'struct todo_list' instance, and compares it with the
> >>file's current stat data after 'reword', 'squash' and 'exec'
> >>instructions. If the two stat data doesn't match, it re-reads the
> >>todo file.
> >>
> >>Sounds simple, but there are some subtleties going on here:
> >>
> >> - The 'struct todo_list' holds the stat data from the time when the
> >> todo file was last read.
> >>
> >> - This stat data in 'struct todo_list' is not updated when the
> >> sequencer itself writes the todo file.
> >>
> >> - Before executing each instruction during an interactive rebase,
> >> the sequencer always updates the todo file by removing the
> >> just-about-to-be-executed instruction. This changes the file's
> >> size and inode [1].
> >>
> >>Consequently, when the sequencer looks at the stat data after a
> >>'reword', 'squash' or 'exec' instruction, it most likely finds that
> >>they differ, even when the user didn't modify the todo list at all!
> >>This is not an issue in practice, it just wastes a few cycles on
> >>re-reading the todo list that matches what the sequencer already has
> >>in memory anyway.
> >
> >It can be much more than just a few cycles, because the total number
> >of parsed instructions from all the todo file re-reads can go
> >quadratic with the number of rebased commits.
> >
> >The simple test below runs 'git rebase -i -x' on 1000 commits, which
> >takes over 14seconds to run. If it doesn't re-read the todo file at
> >all (I simply deleted the whole condition block checking the stat data
> >and re-reading) it runs for only ~2.5secs.
> >
> >Just another angle to consider...
>
> I know dscho was keen to avoid re-parsing the list all the time [1]
> presumably because of the quadratic behavior. (He also assumed most people
> were using ns stat times [2] but that appears not to be the case)
Nanosecond file timestamp comparisons are only enabled by the USE_NSEC
macro, which is only defined if the USE_NSEC Makefile knob is enabled,
but that is not enabled by default.
Then there is the related NO_NSEC Makefile knob:
# Define NO_NSEC if your "struct stat" does not have "st_ctim.tv_nsec"
# available. This automatically turns USE_NSEC off.
As Dscho mentioned in [2], we do disable nanosecond file timestamp
comparisons in 'config.mak.uname' on a handful of platforms by setting
NO_NSEC. This, however, does not mean that nanosec timestamps are
enabled on other platforms by default.
> Could we
> just compare the text of the todo list on disk to whats in todo->buf.buf
> (with an appropriate offset)? That would avoid parsing the text and looking
> up all the commits with get_oid()
Comparing the contents without parsing is still quadratic in the size
of the todo list, though I suppose with a much lower constant factor
than actually parsing it.
> [1]
> https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/
> [2]
> https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/
>
> >
> > --- >8 ---
> >
> >test_expect_success 'test' '
> > num_commits=1000 &&
> > test_commit_bulk --filename=file $num_commits &&
> >
> > /usr/bin/time -f "Elapsed time: %e" \
> > git rebase -i --root -x true 2>out &&
> >
> > grep "Executing: true" out >actual &&
> > test_line_count = $num_commits actual &&
> >
> > # show the elapsed time
> > tail -n2 out
> >'
> >
> > --- >8 ---
> >
next prev parent reply other threads:[~2019-11-25 15:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-22 23:10 git 2.24: git revert <commit1> <commit2> requires extra '--continue'? Brian Norris
2019-11-23 0:34 ` SZEDER Gábor
2019-11-23 9:53 ` Phillip Wood
2019-11-23 17:20 ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick SZEDER Gábor
2019-11-23 21:14 ` Johannes Schindelin
2019-11-24 4:49 ` Junio C Hamano
2019-11-24 10:44 ` Phillip Wood
2019-11-24 21:10 ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
2019-11-25 1:28 ` Junio C Hamano
2019-11-25 3:10 ` SZEDER Gábor
2019-11-25 13:18 ` SZEDER Gábor
2019-11-25 14:43 ` Phillip Wood
2019-11-25 15:15 ` SZEDER Gábor [this message]
2019-11-25 16:40 ` Phillip Wood
2019-11-25 1:10 ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick Junio C Hamano
2019-11-25 10:47 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191125151517.GE23183@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=briannorris@chromium.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.