git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: jon.seymour@gmail.com
Cc: git@vger.kernel.org, linux@horizon.com, spearce@spearce.org
Subject: Re: is hosting a read-mostly git repo on a distributed file system practical?
Date: 12 Apr 2011 23:47:15 -0400	[thread overview]
Message-ID: <20110413034715.15553.qmail@science.horizon.com> (raw)

> All clients, including the client that occasionally updates the
> read-mostly repo would be mounting the DFS as a local file system. My
> environment is one where DFS is easy, but establishing a shared server
> is more complicated (ie. bureaucratic).

> I guess I am prepared to put up with a slow initial clone (my developer
> pool will be relatively stable and pulling from a peer via git: or ssh:
> will usually be acceptable for this occasional need).

> What I am most interested in is the incremental performance. Can my
> integrator, who occasionally updates the shared repo, avoid automatically
> repacking it (and hence taking the whole of repo latency hit) and can
> my developers who are pulling the updates do so reliably without a whole
> of repo scan?

I think the answers are yes, but I have to make a vouple of things clear:
* You can *definitely* control repack behaviour.  .keep files are the
  simplest way to prevent repacking.
* Are you talking about hosting only a "bare" repository, or one with
  the unpacked source tree as well?  If you try to run git commands on
  a large network-mounted source tree, things can get more than a bit
  sluggish; git recursively stats the whole tree fairly frequently.
  (There are ways to precent that, notably core.ignoreStat, but they
  make it less friendly.)
* You can clone from a repository mounted on the file system just as
  easily as you can from a network server.  So there's no need to set
  up a server if you find it onconvenient.
* Normally, the developers will clone from the integrator's repository
  before doing anything, so the source tree, and any changes they make,
  will be local.
* A local clone will try to hard link to the object directory.  I think
  it will copy them if it fails, or you can force that with "git clone
  --no-hardlinks".  For a more space-saving version, try "git clone
  -s", which will make a sort of soft link to the upstream repository.
  It's a git concept, so repacking upstream won't do any harm, but you
  Must Not delete objects from the upstream repository or you'll create
  dangling references in the downstream.
* If using the objects on the DFS mount turns out to be slow, you can
  just do the initial clone with --no-hardlinks.  Then the developers'
  day-to-day work is all local.

Indeed, you could easily do everything via DFS.  Give everyone a personal
"public" repo to push to, which is read-only to everyone else, and let
the integrator pull from those.

> I understand that avoiding repacking for an extended period brings its
> own problems, so I guess I could live with a local repack followed by
> an rsync transfer to re-initial the shared remote, if this was
> warranted.

Normally, you do a generational garbage collection thing.  You repack the
current work frequently (which is fast to do, and to share, because
it's small), and the larger, slower, older packs less frequently.

Anyway, I hope this helps!

             reply	other threads:[~2011-04-13  3:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-13  3:47 George Spelvin [this message]
2011-04-13  4:57 ` is hosting a read-mostly git repo on a distributed file system practical? Jon Seymour
  -- strict thread matches above, loose matches on Subject: below --
2011-04-13  1:40 Jon Seymour
2011-04-13  2:06 ` Shawn Pearce
2011-04-13  2:29   ` Jon Seymour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110413034715.15553.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=git@vger.kernel.org \
    --cc=jon.seymour@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).