From: Jeremy Maitin-Shepard <jbms@cmu.edu>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Junio C Hamano <gitster@pobox.com>, Nicolas Pitre <nico@cam.org>,
Brandon Casey <casey@nrlssc.navy.mil>,
Geert Bosch <bosch@adacore.com>, Jeff King <peff@peff.net>,
git@vger.kernel.org
Subject: Re: git gc & deleted branches
Date: Fri, 09 May 2008 20:43:52 -0400 [thread overview]
Message-ID: <873aoryomv.fsf@jeremyms.com> (raw)
In-Reply-To: <20080510002014.GH29038@spearce.org> (Shawn O. Pearce's message of "Fri, 9 May 2008 20:20:14 -0400")
"Shawn O. Pearce" <spearce@spearce.org> writes:
> Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
>> It is extremely cumbersome to have to worry about whether there are
>> other concurrent accesses to the repository when running e.g. git gc.
>> For servers, you may never be able to guarantee that nothing else is
>> accessing the repository concurrently. Here is a possible solution:
>>
>> Each git process creates a log file of the references that it has
>> created. The log file should be named in some way with e.g. the process
>> id and start time of the process, and simply consist of a list of
>> 20-byte sha1 hashes to be considered additional in-use references for
>> the purpose of garbage collection.
> I believe we partially considered that in the past and discarded it
> as far too complex implementation-wise for the benefit it gives us.
It doesn't seem all that complex, and I'd say that fundamentally it is
the _correct_ way to do things. Being sloppy is always easier in the
short run, but then either means the system is permanently broken or
results in a lot of "fixing up" work later. I think almost all of the
work of handling these log files could be done without impacting a lot
of code that calls the relevant APIs that would actually use the log
files. I think the biggest impact would be on non-C code, but even for
that code, appropriate wrapper could be used to avoid having to make
many changes.
> The current approach of leaving unreachable loose objects around
> for 2 weeks is good enough. Any Git process that has been running
> for 2 weeks while still not linking everything it needs into the
> reachable refs of that repository is already braindamaged and
> shouldn't be running anymore.
This sort of reasoning just leads to an inherently unreliable system.
Sure, two weeks might seem good enough for nearly all cases, but why
_shouldn't_ I be able to leave my editor open for two weeks before
typing in my commit message and finishing the commit, or wait for two
weeks in the middle of a rebase (it seems that in the new
implementation, temporary refs are created basically to do the same
thing as the log file I described.) I could easily be typing up my
commit message, then switch to something else, and happen not to come
back to it for two weeks.
Because such a "timeout" based solution isn't really the "correct
solution" but will work most of the time, potential problems won't be
noticed while testing.
Another significant issue is that this timeout means that unreferenced
junk has to stay around in the repository for two weeks for no (good)
reason.
> If we are dealing with a pack file, those are protected by .keep
> "lock files" between the time they are created on disk and the
> time that the git-fetch or git-receive-pack process has finished
> updating the refs to anchor the pack's contents as reachable.
> Every once in a while a stale .keep file gets left behind when a
> process gets killed by the OS, and its damn annoying to clean up.
> I'd hate to clean up logs from every little git-add or git-commit
> that aborted in the middle uncleanly.
First of all, merely exiting due to an error should not cause log files
to be left around. The only thing that should cause log files to be
left around is kill -9 or a system crash. Second, by storing the
process id and a timestamp of when the log file was created, it is
possible to reliably determine if a log file is stale.
--
Jeremy Maitin-Shepard
next prev parent reply other threads:[~2008-05-10 0:45 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-08 17:45 git gc & deleted branches Guido Ostkamp
2008-05-08 18:39 ` Jeff King
2008-05-08 18:55 ` Guido Ostkamp
2008-05-08 20:07 ` Brandon Casey
2008-05-08 20:52 ` Guido Ostkamp
2008-05-08 21:01 ` Jeff King
2008-05-08 21:15 ` Nicolas Pitre
2008-05-08 21:17 ` Jeff King
2008-05-08 21:23 ` Brandon Casey
2008-05-08 21:31 ` Jeff King
2008-05-08 21:40 ` Brandon Casey
2008-05-08 21:44 ` Jeff King
2008-05-08 21:53 ` Brandon Casey
2008-05-08 22:48 ` Jeff King
2008-05-09 1:41 ` Brandon Casey
2008-05-09 3:21 ` Junio C Hamano
[not found] ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
[not found] ` <7v1w4bb291.fsf@gitster.siamese.dyndns.org>
2008-05-10 3:32 ` Brandon Casey
2008-05-10 4:15 ` Brandon Casey
2008-05-10 4:01 ` [PATCH 0/3] leave unreferenced objects unpacked drafnel
2008-05-10 4:01 ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
2008-05-10 6:03 ` Jeff King
2008-05-11 1:10 ` Nicolas Pitre
2008-05-11 1:23 ` Junio C Hamano
2008-05-11 4:16 ` Brandon Casey
2008-05-11 4:51 ` Brandon Casey
2008-05-10 4:01 ` [PATCH 2/3] git-gc: always use -A when manually repacking drafnel
2008-05-10 4:01 ` [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect drafnel
2008-05-09 4:19 ` git gc & deleted branches Jeff King
2008-05-09 15:00 ` Geert Bosch
2008-05-09 15:14 ` Brandon Casey
2008-05-09 15:53 ` Jeff King
2008-05-09 15:56 ` Brandon Casey
2008-05-09 16:12 ` Nicolas Pitre
2008-05-09 16:54 ` Brandon Casey
2008-05-09 22:33 ` Junio C Hamano
2008-05-09 23:09 ` [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch Chris Frey
2008-05-10 0:07 ` git gc & deleted branches Jeremy Maitin-Shepard
2008-05-10 0:20 ` Shawn O. Pearce
2008-05-10 0:43 ` Jeremy Maitin-Shepard [this message]
2008-05-10 1:21 ` Junio C Hamano
2008-05-10 1:51 ` Jeremy Maitin-Shepard
2008-05-10 5:25 ` Jeff King
2008-05-10 5:36 ` Jeremy Maitin-Shepard
2008-05-10 9:04 ` Johannes Schindelin
2008-05-10 16:24 ` Jeremy Maitin-Shepard
2008-05-11 11:11 ` Johannes Schindelin
2008-05-11 18:39 ` Junio C Hamano
2008-05-08 21:33 ` Guido Ostkamp
2008-05-08 20:56 ` Jeff King
2008-05-08 20:51 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=873aoryomv.fsf@jeremyms.com \
--to=jbms@cmu.edu \
--cc=bosch@adacore.com \
--cc=casey@nrlssc.navy.mil \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).