* Thoughts about eliminating reachability races
@ 2013-05-11 9:47 Michael Haggerty
0 siblings, 0 replies; only message in thread
From: Michael Haggerty @ 2013-05-11 9:47 UTC (permalink / raw)
To: git discussion list
How to get rid of reachability races?
This email is meant more as a basis for discussion (including at the
GitMerge conference that is currently running) than as a fully-baked
proposal.
A lot of reachability-related races have been discussed recently:
* If a reference is renamed while a prune is reading the loose
references, it could be that prune sees neither the old nor the new
version of the reference and could prune the objects referred to by
the reference.
* If a process is building a tree, it could find that some objects are
already in the DB, in which case it will refer to the existing
objects. But these objects might be unreachable, and could be
pruned before the process gets around to creating a root reference
that makes them reachable.
* Other races involving packed refs vs. loose refs.
These races are most critical for busy hosting sites like GitHub but
could theoretically impact anybody.
It has been discussed that we should offer, as an alternative to
packed-refs/loose refs in the filesystem, a way to store refs in a
single database store of some kind that offers ACID guarantees. This
would solve the first and last kind of race in one step.
Other mechanisms are being discussed to address the second race, maybe
involving moving pruned objects into a kind of purgatory for a period
of time before they are finally deleted.
But I have the feeling that these races could also be solved via the
use of the same database.
I am thinking of a scheme in which the database stores two more things
during prunes:
1. The names of objects that are being pruned. Such objects are
considered deleted (even though they might not yet have been
deleted from the object database) and no new references may be made
to them.
2. The names of objects to which new links have been added. Prune is
not allowed to delete this objects or objects to which they refer.
like those used for incremental garbage
collection of memory, in which *during* a garbage collection the VM
keeps tracks of new references that are made to objects that are in
the generation that is currently being garbage collected. The garbage
collector has to consider these links in its reachability analysis.
Our situation is a little bit more complicated, because it is not easy
to inform a git process that a prune has been started. At best, the
process can check occasionally whether a prune is running. On the
other hand, it *is* permissible for Git processes to fail in
exceptional circumstances, as long as it happens rarely and leaves the
object store in a valid state.
So I'm thinking of a scheme in which
1. prune has to list "doomed" objects to the database before deleting
them. Such objects are considered deleted by other processes (*if*
they know that a prune is running).
2. If a process knows that a prune is running, then it records to the
database any new links that it makes to existing objects. Prune is
not allowed to prune such objects.
3. To catch the situation that a process didn't know that a prune is
running, we keep a prune generation number. If that number changes
between the beginning/end of a process, then the process fails
without recording any new references.
Design goals:
* Correctness
* "Normal processes" don't block on each other or on prune.
* Normal processes don't get significantly slower in the usual case
(but are allowed to get slower during pruning).
Assumptions:
* There is a store with atomic transactions.
* Most normal processes run more quickly than a prune. It will be
rare for an entire prune to run while a normal process is running
(though even if it does we must behave correctly).
* Normal processes are allowed to fail occasionally as long as they
leave the DB in a valid state.
In the store, record the following values:
* Git references -- (refname, object_name) pairs
* prune_in_progress (boolean) -- a destructive operation is running
* prune_generation (integer) -- incremented each time objects are
invalidated
* doomed_objects -- a list of the names of objects that a prune is in
the process of deleting. These objects *are considered deleted* and
may not be referred to. Only used when prune_in_progress.
* new_links -- a list of the names of objects newly created or newly
referenced during a prune. These objects are considered referenced
and *may not be pruned*. Only used when prune_in_progress.
A nondestructive operation will usually have two interactions with the
store. At start::
def record_new_link(sha1):
"""Record that there is a new link to object sha1."""
if prune_in_progress:
if sha1 in doomed_objects:
die()
append sha1 to new_links (RAM copy)
with transaction:
read prune_generation
read prune_in_progress
if prune_in_progress:
read doomed_objects
read any needed Git refs
for each new object or new link to an old object:
record_new_link(object_name)
with transaction:
if prune_generation has changed:
die()
if prune_in_progress:
read current doomed_objects:
if any of new_links are in doomed_objects:
die()
append list of new git references to new_links
write changes to git references
# From this moment on, the new_links are considered to be
# reachable and neither they nor any objects that they refer to
# may be pruned.
A destructive operation will have at least three interactions with the
store::
with transaction:
if prune_in_progress:
abort
prune_in_progress = True
clear doomed_objects list
clear new_links list
known_links = list of all object references
doomed_objects = compute list of objects unreachable from known_links
while True:
with transaction:
write current version of doomed_objects
read new_links from store
clear new_links in store
# From this moment on, objects in doomed_objects are
# considered deleted.
if new_links is empty:
break
else:
remove objects reachable from new_links from doomed_objects
delete doomed objects from filesystem
with transaction:
clear list of doomed objects
increment stored value of prune_generation
prune_in_progress = False
I hope that's clear; please poke holes in it.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2013-05-11 9:47 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-11 9:47 Thoughts about eliminating reachability races Michael Haggerty
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).