From: Nicolas Pitre <nico@fluxnic.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, git-dev@github.com,
"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCH] pack-objects: protect against disappearing packs
Date: Thu, 13 Oct 2011 22:42:28 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.02.1110132234210.17040@xanadu.home> (raw)
In-Reply-To: <20111014012320.GA4395@sigill.intra.peff.net>
On Thu, 13 Oct 2011, Jeff King wrote:
> It's possible that while pack-objects is running, a
> simultaneously running prune process might delete a pack
> that we are interested in. Because we load the pack indices
> early on, we know that the pack contains our item, but by
> the time we try to open and map it, it is gone.
>
> Since c715f78, we already protect against this in the normal
> object access code path, but pack-objects accesses the packs
> at a lower level. In the normal access path, we call
> find_pack_entry, which will call find_pack_entry_one on each
> pack index, which does the actual lookup. If it gets a hit,
> we will actually open and verify the validity of the
> matching packfile (using c715f78's is_pack_valid). If we
> can't open it, we'll issue a warning and pretend that we
> didn't find it, causing us to go on to the next pack (or on
> to loose objects).
>
> Furthermore, we will cache the descriptor to the opened
> packfile. Which means that later, when we actually try to
> access the object, we are likely to still have that packfile
> opened, and won't care if it has been unlinked from the
> filesystem.
>
> Notice the "likely" above. If there is another pack access
> in the interim, and we run out of descriptors, we could
> close the pack. And then a later attempt to access the
> closed pack could fail (we'll try to re-open it, of course,
> but it may have been deleted). In practice, this doesn't
> happen because we tend to look up items and then access them
> immediately.
>
> Pack-objects does not follow this code path. Instead, it
> accesses the packs at a much lower level, using
> find_pack_entry_one directly. This means we skip the
> is_pack_valid check, and may end up with the name of a
> packfile, but no open descriptor.
>
> We can add the same is_pack_valid check here. Unfortunately,
> the access patterns of pack-objects are not quite as nice
> for keeping lookup and object access together. We look up
> each object as we find out about it, and the only later when
> writing the packfile do we necessarily access it. Which
> means that the opened packfile may be closed in the interim.
>
> In practice, however, adding this check still has value, for
> three reasons.
>
> 1. If you have a reasonable number of packs and/or a
> reasonable file descriptor limit, you can keep all of
> your packs open simultaneously. If this is the case,
> then the race is impossible to trigger.
>
> 2. Even if you can't keep all packs open at once, you
> may end up keeping the deleted one open (i.e., you may
> get lucky).
>
> 3. The race window is shortened. You may notice early that
> the pack is gone, and not try to access it. Triggering
> the problem without this check means deleting the pack
> any time after we read the list of index files, but
> before we access the looked-up objects. Triggering it
> with this check means deleting the pack means deleting
> the pack after we do a lookup (and successfully access
> the packfile), but before we access the object. Which
> is a smaller window.
> ---
you should put your SOB above that line I would think.
Acked-by: Nicolas Pitre <nico@fluxnic.net>
> We're seeing this at GitHub because we prune pretty
> aggressively. We let pushes go into individual repositories,
> but then we kick off a job to move the resulting objects
> into the repository's "network" repo, which is basically a
> big alternates repository for related repos.
While this patch certainly has value, it doesn't provide 100%
reliability for that use case. Maybe the github infrastructure should
simply skip any auto-repack on push if some other object maintenance
operation is ongoing, possibly via the pre-auto-gc hook.
Nicolas
next prev parent reply other threads:[~2011-10-14 2:42 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-14 1:23 [PATCH] pack-objects: protect against disappearing packs Jeff King
2011-10-14 1:31 ` Jeff King
2011-10-14 2:43 ` Nicolas Pitre
2011-10-14 2:42 ` Nicolas Pitre [this message]
2011-10-14 13:02 ` Jeff King
2011-10-14 7:06 ` Johannes Sixt
2011-10-14 13:07 ` Jeff King
2011-10-14 14:35 ` Johannes Sixt
2011-10-14 18:03 ` [PATCH 1/2] " Jeff King
2011-10-14 18:04 ` [PATCH 2/2] downgrade "packfile cannot be accessed" errors to warnings Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1110132234210.17040@xanadu.home \
--to=nico@fluxnic.net \
--cc=git-dev@github.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).