All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, Stefan Beller <sbeller@google.com>,
	Derrick Stolee <stolee@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
Date: Fri, 08 Jun 2018 13:59:56 +0200	[thread overview]
Message-ID: <867en9pjyb.fsf@gmail.com> (raw)
In-Reply-To: <20180531174024.124488-1-dstolee@microsoft.com> (Derrick Stolee's message of "Thu, 31 May 2018 17:40:39 +0000")

Derrick Stolee <dstolee@microsoft.com> writes:

> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.

I like to think about those features as providing an overlay for the
commit graph (similar to overlay filesystems, like overlayfs); in the
case of git-replace quite literally.

I will be calling all those features "history-altering features", for
short.

> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.

What I would like to see here in cover letter is a generic description
of _strategy_ to deal with those history-altering features.

From what I understand you have the following options for each of
replacements, shallow clones and grafts:
 - turn off serialized commit-graph if given history-altering feature is
   present, as if core.commitGraph was set to false
 - invalidate and optionally refresh commit-graph file if given
   history-altering feature is present (maybe only if it changed the
   view of the history, is such check is possible)

For automatic invalidation you would need to have either:
 - cover all possible ways by which given history-altering feature can
   change the view of history, or
 - keep state of history-altering change for which commit-graph was
   created (e.g. in proposed VAL4 chunk) and compare with current view
   of history if it changed

For automatic turning off you would need only to check if
history-altering feature is present.


Let us examine each of those history-altering features that Git
supports:

* shallow clone

  - shallow clone usually means having shorter history, so serialized
    commit-graph is not needed as much; also changing the depth
    screws-up assumptions about generation numbers

  - there are only a few entry points changing the view of history,
    namely fetch and push with shallow options (--depth=<depth>,
    --deepen=<depth>, --shallow-since=<date>, --update-shallow,
    --shallow-exclude=<revision>, --unshallow)

  - it is easy to check for presence of shallow clone feature by
    chacking of $GIT_DIR/shallow exists and is not empty

  - different contents of $GIT_DIR/shallow means different view of
    history

  - NOTE: internally uses grafts mechanism (emulated grafts)

* replacements (replace objects, git-replace)

  - git-replace can be used to join current development repository with
    historical repository, in which case we would certainly want to make
    use of serialied commit graph; on the other hand the replacements do
    not necessary alter the view of the history

  - theoretically you could create replacement refs by hand, but in
    practice there are TWO ways of getting them:

    - using git-replace command to create, edit/change and delete
      replacement objects'
      '
    - fetching or having pushed-to refs in refs/replace/* namespace

  - you need to check if there are any refs in refs/replace/* namespace
    to check if the feature is used (but it doesn't necessarily mean
    that it altered project history)

  - changed list of refs in refs/replace/* namespace (which you can get
    with for-each-ref command/API) does not necessarily mean that the
    view of the history changed: you can replace non-commit object, you
    can replace commits and not change their parents; it is not as easy
    as checking if file exists

  - NOTE: the feature can be turned off manually with
    GIT_NO_REPLACE_OBJECTS environment variable and with
    --no-replace-objects option to git wrapper.

    Also when pushing, fetching and fsck-ing this feature is turned off
    and refs in refs/replace/* namespace are treated as ordinary refs.

    This may mean that we would want to create commit-graph with
    replacements for ordinary non-bare repository, and without
    replacements for server-side bare repository.

* grafts

  - subset of uses of git-replace, older and *obsolete* feature (because
    it is unsafe to use; that is you need to be careful with it).

  - edited by hand, no automatic entry points

  - if $GIT_DIR/info/grafts file is present, then feature is enabled
    (barring some corner cases, like empty file or file consisting only
    of comments)

  - changed contents of this file means changed view of history (well,
    except for reordering lines, or removing/adding empty lines and
    comments)

>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> write_commit_graph_reachable() from `git fetch` and `git commit`. I
> believe this is the only dependence on ds/commit-graph-fsck. The other
> commits should only depend on ds/commit-graph-lockfile-fix.

That's clever way of increasing coverage.

> Running the full test suite after these DO NO MERGE commits results in
> the following test failures, which I categorize by type of failure.
>
> The following tests are red herrings. Most work by deleting a commit
> from the object database and trying to demonstrate a failure. Others
> rely on how certain objects are loaded. These are not bugs, but will
> add noise if you run the tests suite with this patch.
>
> t0410-partial-clone.sh		Failed tests:  5, 8
> t5307-pack-missing-commit.sh		Failed tests:  3-4
> t6011-rev-list-with-bad-commit.sh	Failed test:  4
> t6024-recursive-merge.sh		Failed test:  4

Does this means that those tests should be in the end (i.e. when
core.commitGraph is turned on by defult) be simply run with
core.commitGraph explicitly disabled for the test?

> The following tests are fixed in "commit-graph: enable replace-object
> and grafts".

Would it make sense to split this commit into two dealing separately
with replace objects and with grafts?  Or do they use the same
underlying API?

> t6001-rev-list-graft.sh		Failed tests:  3, 5, 7, 9, 11, 13

O.K.  It might be good idea to add separate test that does the same, but
with git-replace instead of grafts, though.

> t6050-replace.sh			Failed tests:  11-15, 23-24, 29

The t6050-replace.sh does not test changing the DAG of revision
(excluding changing the SHA-1 of commit), if I read it correctly.  It
would be good to test using git-replace to change committer date, and to
change parents: shortening or lengthening history (e.g. emulating
joining two independent lines of development in the latter case).

See also comment about t6001 above.

>
> The following tests involve shallow clones.
>
> t5500-fetch-pack.sh			Failed tests:  30-31, 34, 348-349
> t5537-fetch-shallow.sh		Failed tests:  4-7, 9
> t5802-connect-helper.sh		Failed test:  3
>
> These tests are mostly fixed by three commits:
>
> * commit-graph: avoid writing when repo is shallow
> * fetch: destroy commit graph on shallow parameters

Seems O.K. from the subject, but I'd have to check the details.

> * commit-graph: revert to odb on missing parents

I wonder if reverting to using object database is a good solution, and
if it wouldn't be better to invalidate / delete commit-graph file
instead...

>
> Each of these cases cover a different interaction that occurs with
> shallow clones. Some are due to a commit becoming shallow, while others
> are due to a commit becoming unshallow (and hence invalidating its
> generation number).

Why do not simply check if repository is shallow?

>
> After these changes, there is one test case that still fails, and I
> cannot understand why:
>
> t5500-fetch-pack.sh			Failed test:  348
>
> This test fails when performing a `git fetch --shallow-exclude`. When I
> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
> directory to replay the fetch it performs as expected. After struggling
> with it for a while, I figured I should just put this series up for
> discussion. Maybe someone with more experience in shallow clones can
> point out the obvious issues I'm having.
>
> Thanks,
> -Stolee
>
> Derrick Stolee (6):
>   DO NOT MERGE: compute commit-graph on every commit
>   DO NOT MERGE: write commit-graph on every fetch
>   commit-graph: enable replace-object and grafts
>   commit-graph: avoid writing when repo is shallow
>   fetch: destroy commit graph on shallow parameters
>   commit-graph: revert to odb on missing parents
>
>  builtin/commit.c  |  5 +++++
>  builtin/fetch.c   | 10 +++++++++
>  builtin/gc.c      |  2 +-
>  builtin/replace.c |  3 +++
>  commit-graph.c    | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>  commit-graph.h    |  9 ++++++++
>  commit.c          |  5 +++++
>  environment.c     |  2 +-
>  8 files changed, 95 insertions(+), 6 deletions(-)

      parent reply	other threads:[~2018-06-08 12:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
2018-05-31 19:39   ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
2018-06-09 15:47   ` Jakub Narebski
2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
2018-05-31 19:07   ` Stefan Beller
2018-06-01  2:30   ` Junio C Hamano
2018-06-01 11:46     ` Derrick Stolee
2018-06-02 18:39       ` Jakub Narebski
2018-06-04  2:19       ` Junio C Hamano
2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
2018-05-31 19:29   ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 6/6] commit-graph: revert to odb on missing parents Derrick Stolee
2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
2018-06-01  1:09   ` Derrick Stolee
2018-06-08 11:59 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=867en9pjyb.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.