From: Jeremy Maitin-Shepard <jbms@cmu.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Nicolas Pitre <nico@cam.org>,
Brandon Casey <casey@nrlssc.navy.mil>,
Geert Bosch <bosch@adacore.com>, Jeff King <peff@peff.net>,
git@vger.kernel.org
Subject: Re: git gc & deleted branches
Date: Fri, 09 May 2008 21:51:15 -0400 [thread overview]
Message-ID: <87y76jx6y4.fsf@jeremyms.com> (raw)
In-Reply-To: <7vskwr9coz.fsf@gitster.siamese.dyndns.org> (Junio C. Hamano's message of "Fri, 09 May 2008 18:21:00 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
>> Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
>>> It is extremely cumbersome to have to worry about whether there are
>>> other concurrent accesses to the repository when running e.g. git gc.
>>> For servers, you may never be able to guarantee that nothing else is
>>> accessing the repository concurrently. Here is a possible solution:
>>>
>>> Each git process creates a log file of the references that it has
>>> created. The log file should be named in some way with e.g. the process
>>> id and start time of the process, and simply consist of a list of
>>> 20-byte sha1 hashes to be considered additional in-use references for
>>> the purpose of garbage collection.
> How would that solve the issue that you should not prune/gc the repository
> "clone --shared" aka "alternates" borrows from?
The log files are only for handling in-progress commands editing the
repository. I also describe in first part of the e-mail a possible
solution to that issue as well as the issues created by having multiple
working directories:
When you create a new working directory, you would also create in the
original repository a symlink named
e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
points to the .git directory of the newly created working directory.
git clone -shared would likewise create such a link in the original
repository. There could be a separate simple command to "destroy" a
repository created via clone -shared or via new-work-dir that would
simply remove this "peer" symlink from any repositories it shares from,
and then rm -rf the target repository. The list of repositories that a
given target repository shares from would be discovered using perhaps
several different methods, depending on whether it is a new work dir, an
actual separate repository, or the new type of "shared" repository I
suggested in my original e-mail, namely one that has its own refs but
completely shares the object store of the original repository, e.g. via
a symlink to the original repository's objects directory In any case, I
believe the information to go "upstream" is already available, and we
just need to add those "peer" symlinks in order to be able to go
"downstream".
There could also be a simple git command to move a repository that would
take care of updating all of the references that other repositories have
to it. Currently it is not possible to write such a command, because
the "downstream" links are not stored, but with these added symlinks it
would be possible.
As I said in my previous e-mail, if git gc finds any broken symlinks
(i.e. symlinks that point to invalid repositories), it would error out,
because user attention is required to specify whether the symlinks
correspond to deleted repositories, or to repositories that have been
moved without making the proper updates.
> By the way, I do not think your "git-commit stopped for two weeks due to a
> long editing session of the commit message" should result in any object
> lossage, as the new objects are all reachable from the index, and the new
> tree nor the new commit hasn't been built while you are typing (rather,
> not typing) the log message.
> Hmm, a partial commit that uses a temporary index file may lose, come to
> think of it. Perhaps we should teach reachable.c about the temporary
> index file as well. I dunno.
Well, providing a generic mechanism for telling git about reachable
things other than the index and refs is precisely what these log files
would do, and also because they would record the process id and a
timestamp, stale log files would automatically get cleaned up. If each
individual git command has its own special way of trying to keep track
of temporary references, it is just going to be more complicated and
more error prone.
--
Jeremy Maitin-Shepard
next prev parent reply other threads:[~2008-05-10 1:52 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-08 17:45 git gc & deleted branches Guido Ostkamp
2008-05-08 18:39 ` Jeff King
2008-05-08 18:55 ` Guido Ostkamp
2008-05-08 20:07 ` Brandon Casey
2008-05-08 20:52 ` Guido Ostkamp
2008-05-08 21:01 ` Jeff King
2008-05-08 21:15 ` Nicolas Pitre
2008-05-08 21:17 ` Jeff King
2008-05-08 21:23 ` Brandon Casey
2008-05-08 21:31 ` Jeff King
2008-05-08 21:40 ` Brandon Casey
2008-05-08 21:44 ` Jeff King
2008-05-08 21:53 ` Brandon Casey
2008-05-08 22:48 ` Jeff King
2008-05-09 1:41 ` Brandon Casey
2008-05-09 3:21 ` Junio C Hamano
[not found] ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
[not found] ` <7v1w4bb291.fsf@gitster.siamese.dyndns.org>
2008-05-10 3:32 ` Brandon Casey
2008-05-10 4:15 ` Brandon Casey
2008-05-10 4:01 ` [PATCH 0/3] leave unreferenced objects unpacked drafnel
2008-05-10 4:01 ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
2008-05-10 6:03 ` Jeff King
2008-05-11 1:10 ` Nicolas Pitre
2008-05-11 1:23 ` Junio C Hamano
2008-05-11 4:16 ` Brandon Casey
2008-05-11 4:51 ` Brandon Casey
2008-05-10 4:01 ` [PATCH 2/3] git-gc: always use -A when manually repacking drafnel
2008-05-10 4:01 ` [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect drafnel
2008-05-09 4:19 ` git gc & deleted branches Jeff King
2008-05-09 15:00 ` Geert Bosch
2008-05-09 15:14 ` Brandon Casey
2008-05-09 15:53 ` Jeff King
2008-05-09 15:56 ` Brandon Casey
2008-05-09 16:12 ` Nicolas Pitre
2008-05-09 16:54 ` Brandon Casey
2008-05-09 22:33 ` Junio C Hamano
2008-05-09 23:09 ` [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch Chris Frey
2008-05-10 0:07 ` git gc & deleted branches Jeremy Maitin-Shepard
2008-05-10 0:20 ` Shawn O. Pearce
2008-05-10 0:43 ` Jeremy Maitin-Shepard
2008-05-10 1:21 ` Junio C Hamano
2008-05-10 1:51 ` Jeremy Maitin-Shepard [this message]
2008-05-10 5:25 ` Jeff King
2008-05-10 5:36 ` Jeremy Maitin-Shepard
2008-05-10 9:04 ` Johannes Schindelin
2008-05-10 16:24 ` Jeremy Maitin-Shepard
2008-05-11 11:11 ` Johannes Schindelin
2008-05-11 18:39 ` Junio C Hamano
2008-05-08 21:33 ` Guido Ostkamp
2008-05-08 20:56 ` Jeff King
2008-05-08 20:51 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y76jx6y4.fsf@jeremyms.com \
--to=jbms@cmu.edu \
--cc=bosch@adacore.com \
--cc=casey@nrlssc.navy.mil \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.