From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Steffen Prohaska <prohaska@zib.de>,
Git Mailing List <git@vger.kernel.org>,
pclouds@gmail.com, john@keeping.me.uk, schacon@gmail.com
Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Date: Tue, 26 Aug 2014 12:32:17 -0700 [thread overview]
Message-ID: <xmqqmwarhwse.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140826180018.GB17546@peff.net> (Jeff King's message of "Tue, 26 Aug 2014 14:00:18 -0400")
Jeff King <peff@peff.net> writes:
> On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:
>
>> Steffen Prohaska <prohaska@zib.de> writes:
>>
>> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> >> obviously would not work for non-file inputs, but I think we address
>> >> that already in index_fd: we push non-seekable things off to index_pipe,
>> >> where we spool them to memory.
>> >
>> > It could be handled that way, but we would be back to the original problem
>> > that 32-bit git fails for large files.
>>
>> Correct, and you are making an incremental improvement so that such
>> a large blob can be handled _when_ the filters can successfully
>> munge it back and forth. If we fail due to out of memory when the
>> filters cannot, that would be the same as without your improvement,
>> so you are still making progress.
>
> I do not think my proposal makes anything worse than Steffen's patch.
I think we are saying the same thing, but perhaps I didn't phrase it
well.
> I think the main argument against going further is just that it is not
> worth the complexity. Tell people doing reduction filters they need to
> use "required", and that accomplishes the same thing.
>
>> >> So it seems like the ideal strategy would be:
>> >>
>> >> 1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> >>
>> >> 2. If it's not seekable and the filter is required, try streaming. We
>> >> die anyway if we fail.
>>
>> Puzzled... Is it assumed that any content the filters tell us to
>> use the contents from the db as-is by exiting with non-zero status
>> will always be large not to fit in-core? For small contents, isn't
>> this "ideal" strategy a regression?
>
> I am not sure what you mean by regression here. We will try to stream
> more often, but I do not see that as a bad thing.
I thought the proposed flow I was commenting on was
- try streaming and die if the filter fails
For an optional filter working on contents that would fit in core,
we currently do
- slurp in memory, filter it, use the original if the filter fails
If we switched to 2., then... ahh, ok, I misread "is required" part.
The "regression" does not apply to that case at all.
next prev parent reply other threads:[~2014-08-26 19:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55 ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38 ` Jeff King
2014-08-25 15:06 ` Steffen Prohaska
2014-08-25 15:12 ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43 ` Jeff King
2014-08-25 16:55 ` Steffen Prohaska
2014-08-25 18:35 ` Junio C Hamano
2014-08-26 18:00 ` Jeff King
2014-08-26 19:32 ` Junio C Hamano [this message]
2014-08-26 17:54 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqmwarhwse.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=john@keeping.me.uk \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=prohaska@zib.de \
--cc=schacon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.