From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH] convert filter: supply path to external driver Date: Sun, 19 Dec 2010 21:24:09 -0500 Message-ID: <20101220022409.GA4625@sigill.intra.peff.net> References: <20101218223822.GA18902@arf.padd.com> <20101219212925.GA7393@arf.padd.com> <7vzks1e84p.fsf@alter.siamese.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Pete Wyckoff , git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Mon Dec 20 03:24:19 2010 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PUVQA-00074H-J4 for gcvg-git-2@lo.gmane.org; Mon, 20 Dec 2010 03:24:18 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751345Ab0LTCYN (ORCPT ); Sun, 19 Dec 2010 21:24:13 -0500 Received: from xen6.gtisc.gatech.edu ([143.215.130.70]:43031 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160Ab0LTCYM (ORCPT ); Sun, 19 Dec 2010 21:24:12 -0500 Received: (qmail 24364 invoked by uid 111); 20 Dec 2010 02:24:11 -0000 Received: from 99-108-226-0.lightspeed.iplsin.sbcglobal.net (HELO sigill.intra.peff.net) (99.108.226.0) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.40) with ESMTPA; Mon, 20 Dec 2010 02:24:11 +0000 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Sun, 19 Dec 2010 21:24:09 -0500 Content-Disposition: inline In-Reply-To: <7vzks1e84p.fsf@alter.siamese.dyndns.org> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Sun, Dec 19, 2010 at 01:59:50PM -0800, Junio C Hamano wrote: > Modulo one worry. Don't we have, or don't we at least plant to allow us > to have, a facility to cache expensive blob conversion result, similar to > the textconv caching? How would this change interact with two blobs that > live in different paths? Yeah, it has been talked about, but I don't think anyone is working on it (I don't personally use clean/smudge at all, so it is not something I have thought _that_ much about). This does definitely complicate matters, as the filtering is no longer a pure mapping of sha1->sha1. However, I think in practice we could do just fine by using a multi-level lookup. I.e., mapping a sha1 to be filtered into a tree. Each tree entry would represent the remaining cache parameters. In this case, the only other parameter we have is the path given to the filter (but it could easily be extended to include other parameters, if they existed, in this or other caching cases). We only get a high-performance lookup for the first part of the multi-level (i.e., the sha1), but that's OK if we assume the number of second-level items is going to be small. Which I think is the case here (a sha1 will tend to be found only under one or a few names). An alternative would be to combine all parts of the filter under a single lookup key. E.g., calculate and store under sha1(sha1(blob) + filename)). But that means the notes keys are not actual object sha1s, which throws off pruning. Anyway, that's just my quick thinking on the subject. I don't see any reason to restrict a feature just because we might want to cache it in the future. At the very worst, we could always cache filters which do not use %s, and make only %s users pay the penalty. -Peff