From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Kenneth_=D6lwing?= Subject: Re: Collective wisdom about repos on NFS accessed by concurrent clients (== corruption!?) Date: Fri, 05 Apr 2013 14:35:40 +0200 Message-ID: <515EC51C.9070206@olwing.se> References: <515419D0.7030107@olwing.se> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE To: Git List X-From: git-owner@vger.kernel.org Sat Apr 06 19:21:48 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UOWH0-0002u6-Pr for gcvg-git-2@plane.gmane.org; Sat, 06 Apr 2013 18:47:27 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161382Ab3DEMg0 convert rfc822-to-quoted-printable (ORCPT ); Fri, 5 Apr 2013 08:36:26 -0400 Received: from hotelroom5.mainloop.net ([192.121.13.73]:42608 "EHLO hotelroom5.mainloop.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161279Ab3DEMgY (ORCPT ); Fri, 5 Apr 2013 08:36:24 -0400 X-No-Relay: not in my network Received: from [IPv6:::1] (unknown [82.214.25.167]) by hotelroom5.mainloop.net (Postfix) with ESMTPSA id 4AFE47100E for ; Fri, 5 Apr 2013 14:36:14 +0200 (CEST) User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 In-Reply-To: <515419D0.7030107@olwing.se> X-Antivirus: AVG for E-mail 2013.0.2904 [2641/6225] Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Hi Basically, I'm at a place where I'm considering giving up getting this=20 to work reliably. In general, my setup work really fine, except for the= =20 itty-bitty detail that when I put pressure on things I tend to get into= =20 various kinds of trouble with the central repo being corrupted. Can anyone authoritatively state anything either way? TIA, ken1 On 2013-03-28 11:22, Kenneth =D6lwing wrote: > Hi, > > I'm hoping to hear some wisdom on the subject so I can decide if I'm=20 > chasing a pipe dream or if it should be expected to work and I just=20 > need to work out the kinks. > > Finding things like this makes it sound possible: > http://permalink.gmane.org/gmane.comp.version-control.git/122670 > but then again, in threads like this: > http://kerneltrap.org/mailarchive/git/2010/11/14/44799 > opinions are that it's just not reliable to trust. > > My experience so far is that I eventually get repo corruption when I=20 > stress it with concurrent read/write access from multiple hosts=20 > (beyond any sort of likely levels, but still). Maybe I'm doing=20 > something wrong, missing a configuration setting somewhere, put an=20 > unfair stress on the system, there's a bona fide bug - or, given the=20 > inherent difficulty in achieving perfect coherency between machines o= n=20 > what's visible on the mount, it's just impossible (?) to truly get it= =20 > working under all situations. > > My eventual usecase is to set up a bunch of (gitolite) hosts that all= =20 > are effectively identical and all seeing the same storage and clients= =20 > can then be directed to any of them to get served. For the purpose of= =20 > this query I assume it can be boiled down to be the same as n users=20 > working on their own workstations and accessing a central repo on an=20 > NFS share they all have mounted, using regular file paths. Correct? > > I have a number of load-generating test scripts (that from humble=20 > beginnings have grown to beasts), but basically, they fork a number o= f=20 > code pieces that proceed to hammer the repo with concurrent reading=20 > and writing. If necessary, the scripts can be started on different=20 > hosts, too. It's all about the central repo so clients will retry in=20 > various modes whenever they get an error, up to re-cloning and=20 > starting over. All in all, they can yield quite a load. > > On a local filesystem everything seems to be holding up fine which is= =20 > expected. In the scenario with concurrency on top of shared NFS=20 > storage however, the scripts eventually fails with various problems=20 > (when the timing finally finds a hole, I guess) - possible to clone=20 > but checkouts fails, errors about refs pointing wrong and errors wher= e=20 > the original repo is actually corrupted and can't be cloned from.=20 > Depending on test strategy, I've also had everything going fine (ops=20 > looks fine in my logs), but afterwards the repo has lost a branch or = two. > > I've experimented with various NFS settings (e.g. turning off the=20 > attribute cache), but haven't reached a conclusion. I do suspect that= =20 > it just is a fact of life with a remote filesystem to have coherency=20 > problems with high concurrency like this but I'd be happily proven=20 > wrong - I'm not an expert in either NFS or git. > > So, any opionions either way would be valuable, e.g. 'it should work'= =20 > or 'it can never work 100%' is fine, as well as any suggestions.=20 > Obviously, the concurrency needed to make it probable to hit this=20 > seems so unlikely that maybe I just shouldn't worry... > > I'd be happy to share scripts and whatever if someone would like to=20 > try it out themselves. > > Thanks for your time, > > ken1 > > --=20 > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html