* Fixing the git-repack replacement gap?
@ 2013-06-18 16:52 Martin Fick
2013-06-18 17:17 ` Junio C Hamano
0 siblings, 1 reply; 2+ messages in thread
From: Martin Fick @ 2013-06-18 16:52 UTC (permalink / raw)
To: git, Shawn Pearce
I have been trying to think of ways to fix git-repack so
that it no longer momentarily makes the objects in a repo
inaccessible to all processes when it replaces packfiles
with the same objects in them as an already existing pack
file. To be more explicit, I am talking about the way it
moves the existing pack file (and index) to old-<sha1>.pack
before moving the new packfile in place. During this moment
in time the objects in that packfile are simply not
available to anyone using the repo. This can be
particularly problematic for busy servers.
There likely are at lest 2 ways that the fundamental design
of packfiles, their indexes, and their names have led to
this issue. If the packfile and index were stored in a
single file, they could have been replaced atomically and
thus it would potentially avoid the issue of them being
temporarily inaccessible (although admittedly that might not
work anyway on some filesystems). Alternatively, if the
pack file were named after the sha1 of the packed contents
of the file instead of the sha1 of the objects in the sha1,
then the replacement would never need to happen since it
makes no sense to replace a file with another file with the
exact same contents (unless, of course the first one is
corrupt, but then you aren't likely making the repo
temporarily worse, you are fixing a broken repo).
I suspect these 2 ideas have been discussed before, but
since they are fundamental changes to the way pack files
work (and thus would not be backwards compatible), they are
not likely to get implemented soon. This got me wondering
if there wasn't an easier backwards compatible solution to
avoid making the objects inaccessible?
It seems like the problem could be avoided if we could
simply change the name of the pack file when a replacement
would be needed? Of course, if we just changed the name,
then the name would not match the sha1 of the contained
objects and would likely be considered bad by git? So, what
if we could simply add a dummy object to the file to cause
it to deserve a name change?
So the idea would be, have git-repack detect the conflict in
filenames and have it repack the new file with an additional
dummy (unused) object in it, and then deliver the new file
which no longer conflicts. Would this be possible? If so,
what sort of other problems would this cause? It would
likely cause an unreferenced object and likely cause it to
want to get pruned by the next git-repack? Is that OK,
maybe you want it to get pruned because then the pack file
will get repacked once again without the dummy object later
and avoid the temporarily inaccessible period for objects in
the file?
Hmm, but then maybe that could even be done in a single git-
repack run (at the expense of extra disk space)?
1) Detect the conflict,
2) Save the replacement file
3) Create a new packfile with the dummy object
4) Put the new file with the dummy object into service
5) Remove the old conflicting file (no gap)
6) Place the new conflicting file in service (no dummy)
7) Remove the new file with dummy object (no gap again)
done? Would it work?
If so, is there an easy way to create the dummy file? Can
any object simply be added at the end of a pack file after
the fact (and then added to the index too)? Also, what
should the dummy object be? Is there some sort of null
object that would be tiny and that would never already be in
the pack?
Thanks for any thoughts,
-Martin
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Fixing the git-repack replacement gap?
2013-06-18 16:52 Fixing the git-repack replacement gap? Martin Fick
@ 2013-06-18 17:17 ` Junio C Hamano
0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2013-06-18 17:17 UTC (permalink / raw)
To: Martin Fick; +Cc: git, Shawn Pearce
Martin Fick <mfick@codeaurora.org> writes:
> ... So, what
> if we could simply add a dummy object to the file to cause
> it to deserve a name change?
>
> So the idea would be, have git-repack detect the conflict in
> filenames and have it repack the new file with an additional
> dummy (unused) object in it, and then deliver the new file
> which no longer conflicts. Would this be possible?
Sounds like a fun exercise. I do not think it breaks anything, and
because we have the list of objects to be placed in the resulting
pack fairly early in the process, this sequence would be possible:
(1) enumerate the objects;
(2) compute the resulting packname;
(3) notice it is the same as an existing one;
(4) add another dummy object and go back to (2);
(5) do the heavy-lifting of delitify;
(6) write out the resulting pack.
inside pack-objects.
I do not know if the loop between (2) and (4) is the only necessary
thing to completely avoid the race you are worrying about, though.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-06-18 17:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-18 16:52 Fixing the git-repack replacement gap? Martin Fick
2013-06-18 17:17 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).