git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: "Torsten Bögershausen" <tboegi@web.de>,
	"Junio C Hamano" <gitster@pobox.com>,
	"David Turner" <dturner@twopensource.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	git@vger.kernel.org, "David Turner" <dturner@twopensource.com>
Subject: Re: [PATCH v2 29/33] refs: resolve symbolic refs first
Date: Fri, 13 May 2016 14:33:20 +0200	[thread overview]
Message-ID: <5735C990.8080502@alum.mit.edu> (raw)
In-Reply-To: <20160512082526.GA20817@sigill.intra.peff.net>

On 05/12/2016 10:25 AM, Jeff King wrote:
> On Thu, May 12, 2016 at 03:45:28AM -0400, Jeff King wrote:
> 
>> So I'd expect us to hit that lock_ref_for_update() for each of the new
>> refs. But then we end up in verify_refname_available_dir(), which wants
>> to read all of the loose refs again. So we end up with a quadratic
>> number of calls to read_ref_full().
>>
>> I haven't found the actual bug yet. It may be something as simple as not
>> clearing REF_INCOMPLETE from the loose-ref cache when we ought to. But
>> that's a wild (optimistic) guess.
> 
> Ah, nope, nothing so simple.
> 
> It looks like we get in a chain of:
> 
>   1. we want to update a ref, so we end up in
>      verify_refname_available_dir to make sure we can do so.
> 
>   2. that has to load all of the loose refs in refs/tags, which is
>      expensive.
> 
>   3. we actually update the ref, which calls clear_loose_ref_cache().
> 
> And repeat. Each ref update does the expensive step 2, and it gets
> larger and larger each time.
> 
> I understand why we need to verify_refname_available() on the packed
> refs. But traditionally we would rely on EISDIR or EEXIST to detect
> conflicts with the loose refs, rather than loading using our own cache.
> So I guess that's the new thing that is causing a problem.

Torsten, thanks for the report. Peff, thanks for the analysis.

The problem in this case is a misguided call to
verify_refname_available_dir() in the case that read_raw_ref() fails
with ENOENT. In that case it is not possible for there to be a conflict
with another loose reference, because (1) we already hold the lock, so
the containing directory must exist, and (2) we got ENOENT, so there
can't be a loose reference in a subdirectory named after the reference
that we are trying to create.

As Peff explained, the call of verify_refname_available_dir() was
forcing the loose tags to be loaded, which is expensive in this test
because there are 100000 of them being created one at a time. (If they
were created in a single ref_transaction instead, the "available" tests
would all be done together, before any changes are committed, so the
loose ref cache would not have to be invalidated each time.)

So instead of calling verify_refname_available_dir() here, we should
just consider the reference to be missing but available to be written.

I'll rewrite this patch and submit the new version to the mailing list
as v3 (also with a fix in the commit message). The rest of the patch
series is OK as is, so I won't resend it. The entire series is also
available from my GitHub repo [1] as branch "split-under-lock".

Please note that there are still some calls of
verify_refname_available_dir() against the loose reference cache in this
function. If we wanted to give up a little bit on the quality of our
error messages, I think we could make those paths faster, too. But they
are all in failure paths, so I don't think that they are performance
critical, so I won't make that change.

Michael

[1] https://github.com/mhagger/git

  reply	other threads:[~2016-05-13 12:33 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-06 16:13 [PATCH v2 00/33] Yet more preparation for reference backends Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 01/33] t1404: demonstrate a bug resolving references Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 02/33] commit_ref(): if there is an empty dir in the way, delete it Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 03/33] read_raw_ref(): don't get confused by an empty directory Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 04/33] safe_create_leading_directories(): improve docstring Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 05/33] remove_dir_recursively(): add docstring Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 06/33] refname_is_safe(): use skip_prefix() Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 07/33] refname_is_safe(): don't allow the empty string Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 08/33] refname_is_safe(): insist that the refname already be normalized Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 09/33] commit_ref_update(): write error message to *err, not stderr Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 10/33] rename_ref(): remove unneeded local variable Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 11/33] ref_transaction_commit(): remove local variable n Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 12/33] read_raw_ref(): rename flags argument to type Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 13/33] read_raw_ref(): clear *type at start of function Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 14/33] read_raw_ref(): rename symref argument to referent Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 15/33] read_raw_ref(): improve docstring Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 16/33] read_raw_ref(): move docstring to header file Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 17/33] lock_ref_sha1_basic(): remove unneeded local variable Michael Haggerty
2016-05-06 16:13 ` [PATCH v2 18/33] refs: make error messages more consistent Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 19/33] ref_transaction_create(): disallow recursive pruning Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 20/33] ref_transaction_commit(): correctly report close_ref() failure Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 21/33] delete_branches(): use resolve_refdup() Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 22/33] refs: allow log-only updates Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 23/33] refs: don't dereference on rename Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 24/33] verify_refname_available(): adjust constness in declaration Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 25/33] add_update(): initialize the whole ref_update Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 26/33] lock_ref_for_update(): new function Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 27/33] unlock_ref(): move definition higher in the file Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 28/33] ref_transaction_update(): check refname_is_safe() at a minimum Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 29/33] refs: resolve symbolic refs first Michael Haggerty
2016-05-12  7:45   ` Jeff King
2016-05-12  8:25     ` Jeff King
2016-05-13 12:33       ` Michael Haggerty [this message]
2016-05-13 12:35         ` [PATCH v3 " Michael Haggerty
2016-05-13 12:58           ` Jeff King
2016-05-13 12:51         ` [PATCH v2 " Jeff King
2016-05-14  9:02         ` Torsten Bögershausen
2016-05-06 16:14 ` [PATCH v2 30/33] lock_ref_for_update(): don't re-read non-symbolic references Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 31/33] lock_ref_for_update(): don't resolve symrefs Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 32/33] commit_ref_update(): remove the flags parameter Michael Haggerty
2016-05-06 16:14 ` [PATCH v2 33/33] lock_ref_sha1_basic(): only handle REF_NODEREF mode Michael Haggerty
2016-05-09 20:12 ` [PATCH v2 00/33] Yet more preparation for reference backends David Turner
2016-05-09 21:05   ` Junio C Hamano
2016-05-09 21:50     ` Michael Haggerty
2016-05-09 22:04       ` Junio C Hamano
2016-05-12  7:55         ` Jeff King
2016-05-12 16:10           ` Junio C Hamano
2016-05-10 21:32 ` Junio C Hamano
2016-06-10 12:50 ` Michael Haggerty
2016-06-10 15:43   ` Junio C Hamano
2016-06-13  9:55     ` [ADDENDUM v4] " Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5735C990.8080502@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).