From mboxrd@z Thu Jan 1 00:00:00 1970 From: "George Spelvin" Subject: Re: is hosting a read-mostly git repo on a distributed file system practical? Date: 12 Apr 2011 23:47:15 -0400 Message-ID: <20110413034715.15553.qmail@science.horizon.com> Cc: git@vger.kernel.org, linux@horizon.com, spearce@spearce.org To: jon.seymour@gmail.com X-From: git-owner@vger.kernel.org Wed Apr 13 05:47:30 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q9r3B-0006sU-EC for gcvg-git-2@lo.gmane.org; Wed, 13 Apr 2011 05:47:29 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757043Ab1DMDrR (ORCPT ); Tue, 12 Apr 2011 23:47:17 -0400 Received: from science.horizon.com ([71.41.210.146]:29483 "HELO science.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1756834Ab1DMDrQ (ORCPT ); Tue, 12 Apr 2011 23:47:16 -0400 Received: (qmail 15554 invoked by uid 1000); 12 Apr 2011 23:47:15 -0400 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: > All clients, including the client that occasionally updates the > read-mostly repo would be mounting the DFS as a local file system. My > environment is one where DFS is easy, but establishing a shared server > is more complicated (ie. bureaucratic). > I guess I am prepared to put up with a slow initial clone (my developer > pool will be relatively stable and pulling from a peer via git: or ssh: > will usually be acceptable for this occasional need). > What I am most interested in is the incremental performance. Can my > integrator, who occasionally updates the shared repo, avoid automatically > repacking it (and hence taking the whole of repo latency hit) and can > my developers who are pulling the updates do so reliably without a whole > of repo scan? I think the answers are yes, but I have to make a vouple of things clear: * You can *definitely* control repack behaviour. .keep files are the simplest way to prevent repacking. * Are you talking about hosting only a "bare" repository, or one with the unpacked source tree as well? If you try to run git commands on a large network-mounted source tree, things can get more than a bit sluggish; git recursively stats the whole tree fairly frequently. (There are ways to precent that, notably core.ignoreStat, but they make it less friendly.) * You can clone from a repository mounted on the file system just as easily as you can from a network server. So there's no need to set up a server if you find it onconvenient. * Normally, the developers will clone from the integrator's repository before doing anything, so the source tree, and any changes they make, will be local. * A local clone will try to hard link to the object directory. I think it will copy them if it fails, or you can force that with "git clone --no-hardlinks". For a more space-saving version, try "git clone -s", which will make a sort of soft link to the upstream repository. It's a git concept, so repacking upstream won't do any harm, but you Must Not delete objects from the upstream repository or you'll create dangling references in the downstream. * If using the objects on the DFS mount turns out to be slow, you can just do the initial clone with --no-hardlinks. Then the developers' day-to-day work is all local. Indeed, you could easily do everything via DFS. Give everyone a personal "public" repo to push to, which is read-only to everyone else, and let the integrator pull from those. > I understand that avoiding repacking for an extended period brings its > own problems, so I guess I could live with a local repack followed by > an rsync transfer to re-initial the shared remote, if this was > warranted. Normally, you do a generational garbage collection thing. You repack the current work frequently (which is fast to do, and to share, because it's small), and the larger, slower, older packs less frequently. Anyway, I hope this helps!