From: Nicolas Pitre <nico@fluxnic.net>
To: demerphq <demerphq@gmail.com>
Cc: Git <git@vger.kernel.org>
Subject: Re: Dealing with many many git repos in a /home directory
Date: Thu, 04 Feb 2010 12:35:11 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.2.00.1002041207330.1681@xanadu.home> (raw)
In-Reply-To: <9b18b3111002040029x1c7de0afw4a5ef883588f7a18@mail.gmail.com>
On Thu, 4 Feb 2010, demerphq wrote:
> At $work we have a host where we have about 50-100 users each with
> their own private copies of the same repos. These are cloned froma
> remote via git/ssh and are not thus automatically hardlinking their
> object stores.
>
> This is starting to take a lot of space.
You should keep a pristine copy of that common repository on that host
and make it readable to everyone, and then ask your users to use the
--reference argument with 'git clone' to borrow as much as possible from
that common repository.
For those who already cloned the repository in full i.e. without the
--reference switch, then it is possible to fix the situation simply by
adding the full path to the common repository's .git/objects directory
in their own .git/objects/info/alternates (create it if it doesn't
exist) and then run 'git gc'. That's what the --reference argument to
the clone command does: setting up that .git/objects/info/alternates
file.
> I was thinking it should be possible to hardlink all of the objects in
> the different repos to a canonical single copy.
>
> Would i be correct in thinking that if i have to repos with an
> equivalent .git/objects/../..... file in them that the files are
> necessarily identical and one can be replaced by a hardlink to the
> other?
Yes, you could do that. However you'll save very little by doing that
as the bulk of a repository content is normally stored into pack files,
and those may differ from one repository to another depending on what
exactly the pack contains. The alternates mechanism is more powerful as
it lets Git fetch objects from the canonical repository packed or not,
and more importantly it avoids creating local copy of new objects if
they already exists in that canonical copy meaning that you don't have
to constantly search in every user's repository for potential new
objects to hardlink.
> If this is correct then is there some tool known to the list that
> already does this? I whipped this together:
The "tool" exists in Git already and is what I describe above. The
actual tool you might need is probably a script to populate that
.git/objects/info/alternates file in all your users' repositoryes and
maybe run ,git gc' on their behalf.
Nicolas
prev parent reply other threads:[~2010-02-04 17:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-04 8:29 Dealing with many many git repos in a /home directory demerphq
2010-02-04 9:57 ` Alex Riesen
2010-02-04 15:20 ` Sergio
2010-02-04 15:00 ` Martin Langhoff
2010-02-04 15:32 ` Andreas Schwab
2010-02-04 17:35 ` Nicolas Pitre [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.1002041207330.1681@xanadu.home \
--to=nico@fluxnic.net \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).