From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: server-info dumbing-down Date: Wed, 07 Dec 2005 14:45:58 -0800 Message-ID: <7vzmncmspl.fsf@assigned-by-dhcp.cox.net> References: <20051207215853.GL22159@pasky.or.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org X-From: git-owner@vger.kernel.org Wed Dec 07 23:46:54 2005 Return-path: Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Ek83D-0004S4-Ad for gcvg-git@gmane.org; Wed, 07 Dec 2005 23:46:15 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030398AbVLGWqD (ORCPT ); Wed, 7 Dec 2005 17:46:03 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751778AbVLGWqB (ORCPT ); Wed, 7 Dec 2005 17:46:01 -0500 Received: from fed1rmmtao05.cox.net ([68.230.241.34]:56780 "EHLO fed1rmmtao05.cox.net") by vger.kernel.org with ESMTP id S1751813AbVLGWqA (ORCPT ); Wed, 7 Dec 2005 17:46:00 -0500 Received: from assigned-by-dhcp.cox.net ([68.4.9.127]) by fed1rmmtao05.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20051207224443.DUJE17838.fed1rmmtao05.cox.net@assigned-by-dhcp.cox.net>; Wed, 7 Dec 2005 17:44:43 -0500 To: Petr Baudis In-Reply-To: <20051207215853.GL22159@pasky.or.cz> (Petr Baudis's message of "Wed, 7 Dec 2005 22:58:53 +0100") User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Petr Baudis writes: > I've noticed few commits from Dec 4 landing into the git repository, > which remove various computations and corresponding lines from the > server info (3e15c67 and few ancestors). I'm curious about this - were > the computations that hugely computationally expensive? If not, wouldn't > it be better to leave it in for future use (since it doesn't cost a lot) > rather than making the future deployment of anything using this data > much harder since the server infos won't have it anymore? T and D lines were expensive. Very expensive. They are not used by existing Porcelains. It is dubious if those lines are useful. The information those lines were attempting to give Porcelains were designed way before dumb-transport clients were completed, and it was purely a guesswork what _might_ be needed by them. For example, I did not foresee that dumb-transport clients would grab all the .idx files to see which packs are needed themselves without consulting T lines (which turns out to be the right thing to do anyway), and once they have the .idx file, clients can do better computation themselves to pick which pack is the best one to fetch without help from the T and D lines. For example, if we implement the "staggered overlapping packs" you suggested, the clients will face a choice when walking the commit chain. Two packs may give the object currently being sought after. Which one to pick? One strategy would be to pick the one that contains least number of objects we already have. Another would be to pick the one that contains the most number of objects we do not have yet. This can be done with only having .idx files, and you need to have .idx files for both of them to realize that you have a choice to begin with. IIRC, in your "staggered packs" approach, some recent objects are left unpacked and also in the latest pack. Clients that have all the objects in the latest pack are better off walking individual commits, while other clients that are way behind are better off fetching the pack. To help them, we would need to describe the object database differently from the way objects/info/packs attempted with those T and D lines. We need to say "if you do not have these objects, do not walk individual commits beyond this commit, even though they are available as loose objects, because you are better off grabbing this pack instead". For these reasons, I feel that the whole thing should be stripped down first. The improvements to the dumb-transport clients may need additional information to be computed by server-info, but it is far more likely than not that those additional information would be quite different from what T and D lines were giving them. The repacking strategy, the repacking program to prepare the repository to be helpful to dumb-transport clients, the logic in the clients to take advantage of that repacking strategy, and the additional information server-info supplies to help that happen, need to be designed together, and in this order. The old T/D lines were developed in a wrong order --- we did not know what the best repacking strategy was (and I suspect we still don't) and these lines were done without knowing if they are useful.