git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Peter Karlsson <peter@softwolves.pp.se>,
	git@vger.kernel.org
Subject: Re: Git on Windows, CRLF issues
Date: Mon, 21 Apr 2008 22:39:18 -0400	[thread overview]
Message-ID: <20080422023918.GA5402@sigill.intra.peff.net> (raw)
In-Reply-To: <32541b130804211453x77f3fd49hef645a417a9919ca@mail.gmail.com>

On Mon, Apr 21, 2008 at 05:53:34PM -0400, Avery Pennarun wrote:

> Does anyone know the most efficient way to do this with
> git-filter-branch, when there are already thousands of files in the
> repo with CRLF in them?  Running dos2unix on all the files for every
> single revision could take a *very* long time.

Yes, a tree filter would probably be quite slow due to checking out, and
then munging all of the files.

You could maybe do an index filter that gets the blob SHA1 of each file
that is new, and just munges those. But I think it is even simpler to
just keep a cache of original blob hashes mapping to munged blob hashes.

Something like:

  git filter-branch --index-filter '
    git ls-files --stage |
    perl /path/to/caching-munger |
    git update-index --index-info
  '

where your caching munger looks something like:

-- >8 --
#!/usr/bin/perl

use strict;
use DB_File;
use Fcntl;
tie my %cache, 'DB_File', "$ENV{HOME}/filter-cache", O_RDWR|O_CREAT, 0666
  or die "unable to open db: $!";

while(<>) {
  my ($mode, $hash, $path) = /^(\d+) ([0-9a-f]{40}) \d\t(.*)/
    or die "bad ls-files line: $_";
  $cache{$hash} = munge($hash)
    unless exists $cache{$hash};
  print "$mode $cache{$hash}\t$path\n";
}

sub munge {
  my $h = shift;
  my $r = scalar `git show $h | sed 's/\$/\\r/' | git hash-object -w --stdin`;
  chomp $r;
  return $r;
}
-- 8< --

so we keep a dbm of the hash mapping, and do no work if we have already
seen this blob. If we don't, then we actually do the expensive 'show |
munge | hash-object'. And here our munge adds a CR, but you should be
able to do an arbitrary transformation.

-Peff

  reply	other threads:[~2008-04-22  2:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-21 19:48 Git on Windows, CRLF issues Peter Karlsson
2008-04-21 20:07 ` Johannes Schindelin
2008-04-21 21:53   ` Avery Pennarun
2008-04-22  2:39     ` Jeff King [this message]
2008-04-22 16:51       ` Avery Pennarun
2008-04-23  7:11         ` Peter Karlsson
2008-04-23  8:10           ` Jeff King
2008-04-23 13:47             ` Peter Karlsson
2008-04-23 14:24               ` Johan Herland
2008-04-23 15:12               ` Johannes Sixt
2008-04-23  8:08         ` Jeff King
2008-04-23 10:13           ` Johannes Schindelin
2008-04-23 10:58             ` Jeff King
2008-04-23 10:58           ` Johannes Sixt
2008-04-23 11:04             ` Jeff King
2008-04-23 11:46               ` Johannes Sixt
2008-04-23 21:47                 ` Jeff King
2008-04-23 23:01                   ` Junio C Hamano
2008-04-23 23:04                     ` Avery Pennarun
2008-04-24  8:11                       ` Johannes Schindelin
2008-04-24 16:56                         ` Avery Pennarun
2008-04-24  1:37                     ` Jeff King
2008-04-23 20:02             ` Avery Pennarun
2008-04-24  6:25               ` Johannes Sixt
2008-04-22  6:41     ` Johannes Sixt
2008-04-21 21:51 ` Jakub Narebski
2008-04-22  6:52   ` Peter Karlsson
2008-04-22  9:04     ` Johannes Sixt
2008-04-22  6:31 ` Johannes Sixt
2008-04-22  8:42   ` Peter Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080422023918.GA5402@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peter@softwolves.pp.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).