git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [BUG?] bulk checkin does not respect filters
Date: Thu, 23 Feb 2012 22:02:45 -0500	[thread overview]
Message-ID: <20120224030244.GA15742@sigill.intra.peff.net> (raw)

If I do this:

  git init repo &&
  cd repo &&
  echo foo >small &&
  cat small small small small >large &&
  echo '* filter=foo2bar' >.gitattributes &&
  git config filter.foo2bar.clean 'sed s/foo/bar/' &&
  git config core.bigfilethreshold 10 &&
  git add . &&
  echo "===> small" && git cat-file blob :small
  echo "===> large" && git cat-file blob :large

the output I get is:

  ===> small
  bar
  ===> large
  foo
  foo
  foo
  foo

I.e., the clean filter is not applied to the bulk checkin file. Nor can
it be easily, because we need to know the size of the file to write the
blob header, and we don't know that until we see all of the filter's
output.

In practice, I don't know if this is a huge deal, as people aren't going
to be asking to de-CRLF files that actually cross the 512M
bigfilethreshold (OTOH, I seem to recall there are some filters floating
around for normalizing gzip'd files, which could plausibly be gigantic).

But it seems like the right choice when we see this conflict is not
"don't do filters for streaming checkin", but rather "don't do streaming
checkin when filters are in use" (because streaming is an optimization,
and filters are about correctness).

It would be even nicer if filters could play well with bulk checkin, but
I think that would involve streaming to a tempfile, checking the size of
the file, and then streaming that into an object. Which is better than
putting the whole thing in memory if it would involve swapping, but
probably worse than doing so if you can possibly fit the whole thing in
(because you're doing a ton of extra I/O for the tempfile).

Thoughts? Was this intentional, or just overlooked?

-Peff

             reply	other threads:[~2012-02-24  3:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-24  3:02 Jeff King [this message]
2012-02-24  3:17 ` [BUG?] bulk checkin does not respect filters Junio C Hamano
2012-02-24  3:42   ` Junio C Hamano
2012-02-24  7:54     ` Jeff King
2012-02-24 18:48       ` Junio C Hamano
2012-02-24  8:28   ` Jeff King
2012-02-24  9:39     ` Jeff King
2012-02-24  9:41       ` [PATCH 1/2] teach convert_to_git a "dry run" mode Jeff King
2012-02-24  9:48       ` [PATCH 2/2] do not stream large files to pack when filters are in use Jeff King
2012-02-24 20:03         ` Junio C Hamano
2012-02-24 20:48           ` Jeff King
2012-02-24 21:01             ` Jeff King
2012-02-24 21:20               ` Junio C Hamano
2012-02-24 21:19             ` Jeff King
2012-02-24 22:02               ` [PATCHv2 1/3] teach convert_to_git a "dry run" mode Jeff King
2012-02-24 22:05               ` [PATCHv2 2/3] teach dry-run convert_to_git not to require a src buffer Jeff King
2012-02-24 22:10               ` [PATCHv2 3/3] do not stream large files to pack when filters are in use Jeff King
2012-02-24 22:42               ` [PATCH 2/2] " Junio C Hamano
2012-02-24 20:03       ` [BUG?] bulk checkin does not respect filters Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120224030244.GA15742@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).