All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Steffen Prohaska <prohaska@zib.de>
Cc: Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>,
	pclouds@gmail.com, john@keeping.me.uk, schacon@gmail.com
Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Date: Mon, 25 Aug 2014 11:35:45 -0700	[thread overview]
Message-ID: <xmqq4mx0mn7i.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <E23693B7-0D9D-477D-A303-4A68433EAB79@zib.de> (Steffen Prohaska's message of "Mon, 25 Aug 2014 18:55:51 +0200")

Steffen Prohaska <prohaska@zib.de> writes:

>> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> obviously would not work for non-file inputs, but I think we address
>> that already in index_fd: we push non-seekable things off to index_pipe,
>> where we spool them to memory.
>
> It could be handled that way, but we would be back to the original problem
> that 32-bit git fails for large files.

Correct, and you are making an incremental improvement so that such
a large blob can be handled _when_ the filters can successfully
munge it back and forth.  If we fail due to out of memory when the
filters cannot, that would be the same as without your improvement,
so you are still making progress.

> To implement something like the ideal strategy below, the entire convert 
> machinery for crlf and ident would have to be converted to a streaming
> approach.

Yes, that has always been the longer term vision since the day the
streaming infrastructure was introduced.

>> So it seems like the ideal strategy would be:
>> 
>>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> 
>>  2. If it's not seekable and the filter is required, try streaming. We
>>     die anyway if we fail.

Puzzled...  Is it assumed that any content the filters tell us to
use the contents from the db as-is by exiting with non-zero status
will always be large not to fit in-core?  For small contents, isn't
this "ideal" strategy a regression?

>>  3. If it's not seekable and the filter is not required, decide based
>>     on file size:
>> 
>>       a. If it's small, spool to memory and proceed as we do now.
>> 
>>       b. If it's big, spool to a seekable tempfile.

  reply	other threads:[~2014-08-25 18:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55   ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38   ` Jeff King
2014-08-25 15:06     ` Steffen Prohaska
2014-08-25 15:12       ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43   ` Jeff King
2014-08-25 16:55     ` Steffen Prohaska
2014-08-25 18:35       ` Junio C Hamano [this message]
2014-08-26 18:00         ` Jeff King
2014-08-26 19:32           ` Junio C Hamano
2014-08-26 17:54       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4mx0mn7i.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=john@keeping.me.uk \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=prohaska@zib.de \
    --cc=schacon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.