git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Steffen Prohaska <prohaska@zib.de>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>,
	pclouds@gmail.com, john@keeping.me.uk, schacon@gmail.com
Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Date: Tue, 26 Aug 2014 13:54:28 -0400	[thread overview]
Message-ID: <20140826175428.GA17546@peff.net> (raw)
In-Reply-To: <E23693B7-0D9D-477D-A303-4A68433EAB79@zib.de>

On Mon, Aug 25, 2014 at 06:55:51PM +0200, Steffen Prohaska wrote:

> It could be handled that way, but we would be back to the original problem
> that 32-bit git fails for large files.  The convert code path currently
> assumes that all data is available in a single buffer at some point to apply
> crlf and ident filters.
> 
> If the initial filter, which is assumed to reduce the file size, fails, we
> could seek to 0 and read the entire file.  But git would then fail for large
> files with out-of-memory.  We would not gain anything for the use case that
> I describe in the commit message's first paragraph.

Ah. So the real problem is that we cannot handle _other_ conversions for
large files, and we must try to intercept the data before it gets to
them. So this is really just helping "reduction" filters. Even if our
streaming filter succeeds, it does not help the situation if it did not
reduce the large file to a smaller one.

It would be nice in the long run to let the other filters stream, too,
but that is not a problem we need to solve immediately. Your patch is a
strict improvement.

Thanks for the explanation; your approach makes a lot more sense to me
now.

-Peff

      parent reply	other threads:[~2014-08-26 17:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55   ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38   ` Jeff King
2014-08-25 15:06     ` Steffen Prohaska
2014-08-25 15:12       ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43   ` Jeff King
2014-08-25 16:55     ` Steffen Prohaska
2014-08-25 18:35       ` Junio C Hamano
2014-08-26 18:00         ` Jeff King
2014-08-26 19:32           ` Junio C Hamano
2014-08-26 17:54       ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140826175428.GA17546@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=john@keeping.me.uk \
    --cc=pclouds@gmail.com \
    --cc=prohaska@zib.de \
    --cc=schacon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).