From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Date: Tue, 26 Aug 2014 13:54:28 -0400 Message-ID: <20140826175428.GA17546@peff.net> References: <1408896466-23149-1-git-send-email-prohaska@zib.de> <1408896466-23149-5-git-send-email-prohaska@zib.de> <20140825124323.GB17288@peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Junio C Hamano , Git Mailing List , pclouds@gmail.com, john@keeping.me.uk, schacon@gmail.com To: Steffen Prohaska X-From: git-owner@vger.kernel.org Tue Aug 26 19:54:35 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XMKx1-0008QY-5T for gcvg-git-2@plane.gmane.org; Tue, 26 Aug 2014 19:54:35 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752938AbaHZRyb (ORCPT ); Tue, 26 Aug 2014 13:54:31 -0400 Received: from cloud.peff.net ([50.56.180.127]:59412 "HELO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752540AbaHZRya (ORCPT ); Tue, 26 Aug 2014 13:54:30 -0400 Received: (qmail 3515 invoked by uid 102); 26 Aug 2014 17:54:30 -0000 Received: from c-71-63-4-13.hsd1.va.comcast.net (HELO sigill.intra.peff.net) (71.63.4.13) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Tue, 26 Aug 2014 12:54:30 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 26 Aug 2014 13:54:28 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Aug 25, 2014 at 06:55:51PM +0200, Steffen Prohaska wrote: > It could be handled that way, but we would be back to the original problem > that 32-bit git fails for large files. The convert code path currently > assumes that all data is available in a single buffer at some point to apply > crlf and ident filters. > > If the initial filter, which is assumed to reduce the file size, fails, we > could seek to 0 and read the entire file. But git would then fail for large > files with out-of-memory. We would not gain anything for the use case that > I describe in the commit message's first paragraph. Ah. So the real problem is that we cannot handle _other_ conversions for large files, and we must try to intercept the data before it gets to them. So this is really just helping "reduction" filters. Even if our streaming filter succeeds, it does not help the situation if it did not reduce the large file to a smaller one. It would be nice in the long run to let the other filters stream, too, but that is not a problem we need to solve immediately. Your patch is a strict improvement. Thanks for the explanation; your approach makes a lot more sense to me now. -Peff