git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* is hosting a read-mostly git repo on a distributed file system practical?
@ 2011-04-13  1:40 Jon Seymour
  2011-04-13  2:06 ` Shawn Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Seymour @ 2011-04-13  1:40 UTC (permalink / raw)
  To: Git Mailing List

Is it practical to host a read-mostly git repo on a WAN-based
distributed file system?

The idea is that most developers would use the DFS-based repo to track
the tip of the development stream, but only the integrator would
publish updates to the DFS-based repo.

As such, the need to repack the DFS-based repo will be somewhat, but
not completely, reduced.

Is this going to be practical, or are whole of repo operations
eventually going to kill me because of latency and bandwidth issues
associated with use of the DFS?

Are there things I can do with the git configuration (such as limiting
repacking behaviour) that will help?

jon.

^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: is hosting a read-mostly git repo on a distributed file system practical?
@ 2011-04-13  3:47 George Spelvin
  2011-04-13  4:57 ` Jon Seymour
  0 siblings, 1 reply; 5+ messages in thread
From: George Spelvin @ 2011-04-13  3:47 UTC (permalink / raw)
  To: jon.seymour; +Cc: git, linux, spearce

> All clients, including the client that occasionally updates the
> read-mostly repo would be mounting the DFS as a local file system. My
> environment is one where DFS is easy, but establishing a shared server
> is more complicated (ie. bureaucratic).

> I guess I am prepared to put up with a slow initial clone (my developer
> pool will be relatively stable and pulling from a peer via git: or ssh:
> will usually be acceptable for this occasional need).

> What I am most interested in is the incremental performance. Can my
> integrator, who occasionally updates the shared repo, avoid automatically
> repacking it (and hence taking the whole of repo latency hit) and can
> my developers who are pulling the updates do so reliably without a whole
> of repo scan?

I think the answers are yes, but I have to make a vouple of things clear:
* You can *definitely* control repack behaviour.  .keep files are the
  simplest way to prevent repacking.
* Are you talking about hosting only a "bare" repository, or one with
  the unpacked source tree as well?  If you try to run git commands on
  a large network-mounted source tree, things can get more than a bit
  sluggish; git recursively stats the whole tree fairly frequently.
  (There are ways to precent that, notably core.ignoreStat, but they
  make it less friendly.)
* You can clone from a repository mounted on the file system just as
  easily as you can from a network server.  So there's no need to set
  up a server if you find it onconvenient.
* Normally, the developers will clone from the integrator's repository
  before doing anything, so the source tree, and any changes they make,
  will be local.
* A local clone will try to hard link to the object directory.  I think
  it will copy them if it fails, or you can force that with "git clone
  --no-hardlinks".  For a more space-saving version, try "git clone
  -s", which will make a sort of soft link to the upstream repository.
  It's a git concept, so repacking upstream won't do any harm, but you
  Must Not delete objects from the upstream repository or you'll create
  dangling references in the downstream.
* If using the objects on the DFS mount turns out to be slow, you can
  just do the initial clone with --no-hardlinks.  Then the developers'
  day-to-day work is all local.

Indeed, you could easily do everything via DFS.  Give everyone a personal
"public" repo to push to, which is read-only to everyone else, and let
the integrator pull from those.

> I understand that avoiding repacking for an extended period brings its
> own problems, so I guess I could live with a local repack followed by
> an rsync transfer to re-initial the shared remote, if this was
> warranted.

Normally, you do a generational garbage collection thing.  You repack the
current work frequently (which is fast to do, and to share, because
it's small), and the larger, slower, older packs less frequently.

Anyway, I hope this helps!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-04-13  4:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-13  1:40 is hosting a read-mostly git repo on a distributed file system practical? Jon Seymour
2011-04-13  2:06 ` Shawn Pearce
2011-04-13  2:29   ` Jon Seymour
  -- strict thread matches above, loose matches on Subject: below --
2011-04-13  3:47 George Spelvin
2011-04-13  4:57 ` Jon Seymour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).