All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Johan Herland <johan@herland.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes
Date: Mon, 13 May 2013 04:43:54 +0200	[thread overview]
Message-ID: <5190536A.5090606@alum.mit.edu> (raw)
In-Reply-To: <20130507024313.GC22940@sigill.intra.peff.net>

On 05/07/2013 04:43 AM, Jeff King wrote:
> Once we read the packed-refs file into memory, we cache it
> to save work on future ref lookups. However, our cache may
> be out of date with respect to what is on disk if another
> process is simultaneously packing the refs. Normally it
> is acceptable for us to be a little out of date, since there
> is no guarantee whether we read the file before or after the
> simultaneous update. However, there is an important special
> case: our packed-refs file must be up to date with respect
> to any loose refs we read. Otherwise, we risk the following
> race condition:
> 
>   0. There exists a loose ref refs/heads/master.
> 
>   1. Process A starts and looks up the ref "master". It
>      first checks $GIT_DIR/master, which does not exist. It
>      then loads (and caches) the packed-refs file to see if
>      "master" exists in it, which it does not.
> 
>   2. Meanwhile, process B runs "pack-refs --all --prune". It
>      creates a new packed-refs file which contains
>      refs/heads/master, and removes the loose copy at
>      $GIT_DIR/refs/heads/master.
> 
>   3. Process A continues its lookup, and eventually tries
>      $GIT_DIR/refs/heads/master.  It sees that the loose ref
>      is missing, and falls back to the packed-refs file. But
>      it examines its cached version, which does not have
>      refs/heads/master. After trying a few other prefixes,
>      it reports master as a non-existent ref.
> 
> There are many variants (e.g., step 1 may involve process A
> looking up another ref entirely, so even a fully qualified
> refname can fail). One of the most interesting ones is if
> "refs/heads/master" is already packed. In that case process
> A will not see it as missing, but rather will report
> whatever value happened to be in the packed-refs file before
> process B repacked (which might be an arbitrarily old
> value).
> 
> We can fix this by making sure we reload the packed-refs
> file from disk after looking at any loose refs. That's
> unacceptably slow, so we can check it's stat()-validity as a
> proxy, and read it only when it appears to have changed.
> 
> Reading the packed-refs file after performing any loose-ref
> system calls is sufficient because we know the ordering of
> the pack-refs process: it always makes sure the newly
> written packed-refs file is installed into place before
> pruning any loose refs. As long as those operations by B
> appear in their executed order to process A, by the time A
> sees the missing loose ref, the new packed-refs file must be
> in place.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I hooked the refreshing into get_packed_refs, since then all callers get
> it for free. It makes me a little nervous, though, just in case some
> caller really cares about calling get_packed_refs but not having the
> list of packed-refs change during the call. peel_ref looks like such a
> function, but isn't, for reasons I'll explain in a followup patch.
> 
> Clone also looks like such a caller, as it calls get_packed_refs once
> for each upstream ref it adds (it puts them in the packed-refs list, and
> then writes them all out at the end). But it's OK because there is no
> packed-refs file for it to refresh from.
> 
> An alternative would be to move the refreshing to an explicit
> refresh_packed_refs() function, and call it from a few places
> (resolve_ref, and later from do_for_each_ref).

I think this will be necessary, because otherwise there are too many
places where the packed-refs cache can be invalidated and re-read.

As a test, I changed your stat_validity_check() to return 0 *all* of the
time when a file exists.  I think this simulates a hyperactive
repository in which the packed-refs file changes every time it is
checked.  With this change, hundreds of tests in the test suite fail.

I haven't had time to dig into the failures.  One example is

git-upload-pack: refs.c:542: do_for_each_ref_in_dir: Assertion
`dir->sorted == dir->nr' failed.

This suggests that the packed ref cache was invalidated while somebody
was iterating over it, resulting perhaps in illegal memory references.

Maybe my test is misguided.  But I definitely would not proceed on this
patch until the situation has been understood better.

Michael

>  refs.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index 5a14703..6afe8cc 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -708,6 +708,7 @@ static struct ref_cache {
>  	struct ref_cache *next;
>  	struct ref_entry *loose;
>  	struct ref_entry *packed;
> +	struct stat_validity packed_validity;
>  	/* The submodule name, or "" for the main repo. */
>  	char name[FLEX_ARRAY];
>  } *ref_cache;
> @@ -717,6 +718,7 @@ static void clear_packed_ref_cache(struct ref_cache *refs)
>  	if (refs->packed) {
>  		free_ref_entry(refs->packed);
>  		refs->packed = NULL;
> +		stat_validity_clear(&refs->packed_validity);
>  	}
>  }
>  
> @@ -878,17 +880,25 @@ static struct ref_dir *get_packed_refs(struct ref_cache *refs)
>  
>  static struct ref_dir *get_packed_refs(struct ref_cache *refs)
>  {
> +	const char *packed_refs_file;
> +
> +	if (*refs->name)
> +		packed_refs_file = git_path_submodule(refs->name, "packed-refs");
> +	else
> +		packed_refs_file = git_path("packed-refs");
> +
> +	if (refs->packed &&
> +	    !stat_validity_check(&refs->packed_validity, packed_refs_file))
> +		clear_packed_ref_cache(refs);
> +
>  	if (!refs->packed) {
> -		const char *packed_refs_file;
>  		FILE *f;
>  
>  		refs->packed = create_dir_entry(refs, "", 0, 0);
> -		if (*refs->name)
> -			packed_refs_file = git_path_submodule(refs->name, "packed-refs");
> -		else
> -			packed_refs_file = git_path("packed-refs");
> +
>  		f = fopen(packed_refs_file, "r");
>  		if (f) {
> +			stat_validity_update(&refs->packed_validity, fileno(f));
>  			read_packed_refs(f, get_ref_dir(refs->packed));
>  			fclose(f);
>  		}
> 


-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  parent reply	other threads:[~2013-05-13  2:44 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03  8:38 another packed-refs race Jeff King
2013-05-03  9:26 ` Johan Herland
2013-05-03 17:28   ` Jeff King
2013-05-03 18:26     ` Jeff King
2013-05-03 21:02       ` Johan Herland
2013-05-06 12:12     ` Michael Haggerty
2013-05-06 18:44       ` Jeff King
2013-05-03 21:21 ` Jeff King
2013-05-06 12:03 ` Michael Haggerty
2013-05-06 18:41   ` Jeff King
2013-05-06 22:18     ` Jeff King
2013-05-07  4:32     ` Michael Haggerty
2013-05-07  4:44       ` Jeff King
2013-05-07  8:03         ` Michael Haggerty
2013-05-07  2:36 ` [PATCH 0/4] fix packed-refs races Jeff King
2013-05-07  2:38   ` [PATCH 1/4] resolve_ref: close race condition for packed refs Jeff King
2013-05-12 22:56     ` Michael Haggerty
2013-05-16  3:47       ` Jeff King
2013-05-16  5:50         ` Michael Haggerty
2013-05-12 23:26     ` Michael Haggerty
2013-06-11 14:26     ` [PATCH 0/4] Fix a race condition when reading loose refs Michael Haggerty
2013-06-11 14:26       ` [PATCH 1/4] resolve_ref_unsafe(): extract function handle_missing_loose_ref() Michael Haggerty
2013-06-11 14:26       ` [PATCH 2/4] resolve_ref_unsafe(): handle the case of an SHA-1 within loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 3/4] resolve_ref_unsafe(): nest reference-reading code in an infinite loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 4/4] resolve_ref_unsafe(): close race condition reading loose refs Michael Haggerty
2013-06-12  8:04         ` Jeff King
2013-06-13  8:22         ` Thomas Rast
2013-06-14  7:17           ` Michael Haggerty
2013-06-11 20:57       ` [PATCH 0/4] Fix a race condition when " Junio C Hamano
2013-05-07  2:39   ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-13  2:29     ` Michael Haggerty
2013-05-13  3:00       ` [RFC 0/2] Separate stat_data from cache_entry Michael Haggerty
2013-05-13  3:00         ` [RFC 1/2] Extract a struct " Michael Haggerty
2013-05-13  3:00         ` [RFC 2/2] add a stat_validity struct Michael Haggerty
2013-05-13  5:10         ` [RFC 0/2] Separate stat_data from cache_entry Junio C Hamano
2013-05-16  3:51       ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-07  2:43   ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Jeff King
2013-05-07  2:54     ` [PATCH 0/2] peel_ref cleanups changes Jeff King
2013-05-07  2:56       ` [PATCH 1/2] peel_ref: rename "sha1" argument to "peeled" Jeff King
2013-05-07  3:06       ` [PATCH 2/2] peel_ref: refactor for safety with simultaneous update Jeff King
2013-05-09 19:18     ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Eric Sunshine
2013-05-13  2:43     ` Michael Haggerty [this message]
2013-05-07  2:51   ` [PATCH 4/4] for_each_ref: load all loose refs before packed refs Jeff King
2013-05-07  6:40   ` [PATCH 0/4] fix packed-refs races Junio C Hamano
2013-05-07 14:19     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5190536A.5090606@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johan@herland.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.