From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Shawn O. Pearce" Subject: Re: q: git-fetch a tad slow? Date: Mon, 28 Jul 2008 22:50:14 -0700 Message-ID: <20080729055014.GE11947@spearce.org> References: <20080728160138.GA12777@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org To: Ingo Molnar X-From: git-owner@vger.kernel.org Tue Jul 29 07:51:47 2008 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1KNi7c-0003SC-6C for gcvg-git-2@gmane.org; Tue, 29 Jul 2008 07:51:44 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755733AbYG2FuR (ORCPT ); Tue, 29 Jul 2008 01:50:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756431AbYG2FuR (ORCPT ); Tue, 29 Jul 2008 01:50:17 -0400 Received: from george.spearce.org ([209.20.77.23]:50941 "EHLO george.spearce.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755744AbYG2FuP (ORCPT ); Tue, 29 Jul 2008 01:50:15 -0400 Received: by george.spearce.org (Postfix, from userid 1001) id CA1A1383A5; Tue, 29 Jul 2008 05:50:14 +0000 (UTC) Content-Disposition: inline In-Reply-To: <20080728160138.GA12777@elte.hu> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Ingo Molnar wrote: > > Setup/background: distributed kernel testing cluster, [...] > > Problem: i noticed that git-fetch is a tad slow: > > titan:~/tip> time git-fetch > real 0m2.372s > > There are hundreds of branches, so i thought fetching a single branch > alone would improve things: > > titan:~/tip> time git-fetch origin master > real 0m0.942s > > But that's still slow - so i use a (lame) ad-hoc script instead: > > titan:~/tip> time tip-fetch > real 0m0.246s OK, yes, when there are _many_ branches like that limiting fetch to a narrow focus of only the branch(es) you must have can make it go much faster. Part of the problem is we loop over the branches many times, and those are O(N) loops (N=number of branches). We could do better, but we don't. One reason why your tip-fetch runs so much better is because we don't have to enumerate the hundreds of advertised branches offered up by the remote peer to find the one you want to fetch. Your tip-fetch is reading only that one ref file (.git/refs/heads/master) and that's pretty much it. In contrast git-upload-pack on the server side must open and read _all_ ref files under .git/refs/ and send them to the client, who then has to loop over them at least twice before it can decide if a match exists. That's a lot more data to shove down over SSH. Granted its only 42 bytes + refname per ref, but its still more. Those O(N) loops I referred to earlier can explain why for hundreds of branches it gets ugly. That turns into an O(N^2) matching algorithm. Not pretty. A simple hash would solve a lot of that, changing the first time from 0m2.372s to much closer to the scond time of 0m0.942s. Neither of which can compete with your tip-fetch. Have you tried using git-pack-refs to pack the branches on the remote repository? If you update all of the branches, run `git pack-refs --all --prune`, then allow the testing clients to start fetching it may go much quicker. The pack-refs moves all of the individual ref files into the single .git/packed-refs file, reducing the number of files we need to open and read to service a single fetch client. I wonder if git-pack-refs + fetching only a single branch will get you closer to the tip-fetch time. Also, I wonder if you really need to fetch over SSH. Doing a fetch over git:// is much quicker, as there is no SSH session setup overheads. -- Shawn.