git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/4] get_short_oid: cope with a possibly stale loose object cache
Date: Thu, 14 Mar 2019 08:33:07 -0700 (PDT)	[thread overview]
Message-ID: <pull.161.v2.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.161.git.gitgitgadget@gmail.com>

Being rather a power user of the interactive rebase, I discovered recently
that one of my scripts ran into an odd problem where an interactive rebase
appended new commands to the todo list, and the interactive rebase failed to
validate the (short) OIDs. But when I tried to fix things via git rebase
--edit-todo, the interactive rebase had no longer any problem to validate
them.

Turns out that I generated the objects during the interactive rebase (yes, I
implemented a 2-phase rebase
[https://github.com/git-for-windows/build-extra/blob/master/ever-green.sh]),
and that their hashes happened to share the first two hex digits with some
loose object that had already been read in the same interactive rebase run,
causing a now-stale loose object cache to be initialized for that 
.git/objects/?? subdirectory.

It was quite the, let's say, adventure, to track that one down.

Over at the companion PR for Git for Windows
[https://github.com/git-for-windows/git/pull/2121], I discussed this with
Peff (who introduced the loose object cache), and he pointed out that my
original solution was a bit too specific for the interactive rebase. He
suggested the current method of re-reading upon a missing object instead,
and explained patiently that this does not affect the code path for which
the loose object cache was introduced originally: to help with performance
issues on NFS when Git tries to find the same missing objects over and over
again.

The regression test still reflects what I need this fix for, and I would
rather keep it as-is (even if the fix is not about the interactive rebase
per se), as it really tests for the functionality that I really need to
continue to work. Since I (and other contributors touching the rebase code)
frequently run only the t34* subset of the test suite while developing git
rebase patches, I also want to keep the test case there, rather than putting
it in with other SHA1-specific disambiguation test cases in 
t1512-rev-parse-disambiguation.sh.

My only concern was that this might cause some performance regressions that
neither Peff nor I could think of, where get_oid() may run repeatedly into
missing objects by design, and where we should not blow away and recreate
the loose object cache all the time. But Peff dispelled my fears, pointing
out that get_oid() is already rather expensive and therefore avoided in hot
loops.

Changes since v1:

 * The added test case now properly quotes $GIT_REBASE_TODO (even if it does
   not contain spaces, it is simply safer to quote variable expansions in
   Bash always [https://mywiki.wooledge.org/Quotes].
 * The three lines to try again to disambiguate are no longer repeated.
   Instead, we use a Beautiful Goto.
 * The commit message of 2/2 does not stress lean on the loose object cache
   so much, but also talks about packs a bit more.

Johannes Schindelin (4):
  rebase -i: demonstrate obscure loose object cache bug
  sequencer: improve error message when an OID could not be parsed
  sequencer: move stale comment into correct location
  get_oid(): when an object was not found, try harder

 sequencer.c                 |  5 +++--
 sha1-name.c                 | 13 ++++++++++++-
 t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++
 3 files changed, 37 insertions(+), 3 deletions(-)


base-commit: e902e9bcae2010bc42648c80ab6adc6c5a16a4a5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-161%2Fdscho%2Frereading-todo-list-and-loose-object-cache-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-161/dscho/rereading-todo-list-and-loose-object-cache-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/161

Range-diff vs v1:

 1:  b3fcd37765 ! 1:  63ddb1dd04 rebase -i: demonstrate obscure loose object cache bug
     @@ -60,11 +60,11 @@
      +			"committer A U Thor <author@example.org> $1 +0000" \
      +			"" \
      +			"$1" |
     -+		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
     ++		  git hash-object -t commit -w --stdin))" >>"$GIT_REBASE_TODO"
      +
      +	shift
      +	test -z "$*" ||
     -+	echo "exec $0 $*" >>$GIT_REBASE_TODO
     ++	echo "exec $0 $*" >>"$GIT_REBASE_TODO"
      +	EOS
      +
      +	git rebase HEAD -x "./append-todo.sh 5 6"
 2:  d8c4a3dde5 = 2:  f0cbfacf9a sequencer: improve error message when an OID could not be parsed
 3:  d749c63170 = 3:  b55e14cd63 sequencer: move stale comment into correct location
 4:  994446236d ! 4:  41fc6eb9ba get_oid(): when an object was not found, try harder
     @@ -3,8 +3,9 @@
          get_oid(): when an object was not found, try harder
      
          It is quite possible that the loose object cache gets stale when new
     -    objects are written. In that case, get_oid() would potentially say that
     -    it cannot find a given object, even if it should find it.
     +    objects are written. Or that a new pack was installed. In those cases,
     +    get_oid() would potentially say that it cannot find a given object, even
     +    if it should find it.
      
          Let's blow away the loose object cache as well as the read packs and try
          again in that case.
     @@ -27,19 +28,31 @@
       --- a/sha1-name.c
       +++ b/sha1-name.c
      @@
     + 					 struct object_id *oid,
     + 					 unsigned flags)
     + {
     +-	int status;
     ++	int status, attempts = 0;
     + 	struct disambiguate_state ds;
     + 	int quietly = !!(flags & GET_OID_QUIETLY);
     + 
     +@@
     + 	else
     + 		ds.fn = default_disambiguate_hint;
     + 
     ++try_again:
     + 	find_short_object_filename(&ds);
       	find_short_packed_object(&ds);
       	status = finish_object_disambiguation(&ds, oid);
       
      +	/*
     -+	 * If we didn't find it, do the usual reprepare() slow-path,
     -+	 * since the object may have recently been added to the repository
     -+	 * or migrated from loose to packed.
     ++	 * If we did not find it, do the usual reprepare() slow-path, since the
     ++	 * object may have recently been added to the repository or migrated
     ++	 * from loose to packed.
      +	 */
     -+	if (status == MISSING_OBJECT) {
     ++	if (status == MISSING_OBJECT && !attempts++) {
      +		reprepare_packed_git(the_repository);
     -+		find_short_object_filename(&ds);
     -+		find_short_packed_object(&ds);
     -+		status = finish_object_disambiguation(&ds, oid);
     ++		goto try_again;
      +	}
      +
       	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {

-- 
gitgitgadget

  parent reply	other threads:[~2019-03-14 15:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:35     ` Jeff King
2019-03-13 16:53       ` Jeff King
2019-03-13 22:40         ` Johannes Schindelin
2019-03-14  0:07           ` Jeff King
2019-03-13 22:27       ` Johannes Schindelin
2019-03-14  0:05         ` Jeff King
2019-03-14  6:40       ` Johannes Sixt
2019-03-14 13:06         ` Johannes Schindelin
2019-03-14  1:12   ` Junio C Hamano
2019-03-14 12:44     ` Johannes Schindelin
2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14  1:14   ` Junio C Hamano
2019-03-13 10:16 ` [PATCH 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
2019-03-14  1:29   ` Junio C Hamano
2019-03-14  2:22     ` Jeff King
2019-03-14  2:40       ` Jeff King
2019-03-14  4:05         ` Junio C Hamano
2019-03-14 18:55           ` Jeff King
2019-03-14  3:49       ` Junio C Hamano
2019-03-14 13:17     ` Johannes Schindelin
2019-03-14 18:57       ` Jeff King
2019-03-14 19:55         ` Johannes Schindelin
2019-03-13 15:33 ` [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Jeff King
2019-03-14 15:33 ` Johannes Schindelin via GitGitGadget [this message]
2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.161.v2.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).