From: Jon Seymour <jon.seymour@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: is hosting a read-mostly git repo on a distributed file system practical?
Date: Wed, 13 Apr 2011 12:29:32 +1000 [thread overview]
Message-ID: <BANLkTimvRoj_dop-s=RUdQBENN6Es_TBsA@mail.gmail.com> (raw)
In-Reply-To: <BANLkTimPYchTXiMpnmE47kxiXvJ_c6QZ9Q@mail.gmail.com>
On Wed, Apr 13, 2011 at 12:06 PM, Shawn Pearce <spearce@spearce.org> wrote:
> On Tue, Apr 12, 2011 at 21:40, Jon Seymour <jon.seymour@gmail.com> wrote:
>> The idea is that most developers would use the DFS-based repo to track
>> the tip of the development stream, but only the integrator would
>> publish updates to the DFS-based repo.
>>
>> As such, the need to repack the DFS-based repo will be somewhat, but
>> not completely, reduced.
>
> Serving git clone is basically a repack operation when run over
> git://, http:// or SSH. If the DFS was mounted as a local filesystem,
> git clone would turn into a cpio to copy the directory contents. I'm
> not sure if that is what you are suggesting to do here or not.
>
All clients, including the client that occasionally updates the
read-mostly repo would be mounting the DFS
as a local file system. My environment is one where DFS is easy, but
establishing a shared server is more complicated (ie. bureaucratic).
I guess I am prepared to put up with a slow initial clone (my
developer pool will be relatively stable and pulling from a
peer via git: or ssh: will usually be acceptable for this occasional need).
What I am most interested in is the incremental performance. Can my
integrator, who occasionally
updates the shared repo, avoid automatically repacking it (and hence
taking the whole of repo latency hit)
and can my developers who are pulling the updates do so reliably
without a whole of repo scan?
>> Is this going to be practical, or are whole of repo operations
>> eventually going to kill me because of latency and bandwidth issues
>> associated with use of the DFS?
>
> Latency is a problem. The Git pack file has decent locality, but there
> are some things that could still stand to be improved. It really
> doesn't work well unless the pack is held completely in the machine's
> memory.
I understand that avoiding repacking for an extended period brings its
own problems, so I guess I could live with a local repack followed by
an rsync transfer to re-initial the shared remote, if this was
warranted.
I agree, there is no substitute for testing this, but experience of
others can be helpful in deciding whether it is even worth attempting.
>
> --
> Shawn.
>
next prev parent reply other threads:[~2011-04-13 2:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-13 1:40 is hosting a read-mostly git repo on a distributed file system practical? Jon Seymour
2011-04-13 2:06 ` Shawn Pearce
2011-04-13 2:29 ` Jon Seymour [this message]
-- strict thread matches above, loose matches on Subject: below --
2011-04-13 3:47 George Spelvin
2011-04-13 4:57 ` Jon Seymour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='BANLkTimvRoj_dop-s=RUdQBENN6Es_TBsA@mail.gmail.com' \
--to=jon.seymour@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).